Synchronizing processes
As much as synchronizing the actions of the threads was important, the synchronizing of actions inside the context of multiprocessing is also important. Since multiple processes may be accessing the same shared resource, their access to shared resource needs to be serialized. To help achieve this, we have the support of locks here too.
The following example showcases how to use locks in the context of the multiprocessing module to synchronize the operations of multiple processes by fetching the HTML associated with the URLs and writing that HTML to a common local file:
# url_loader_locks.py
from multiprocessing import Process, Lock
import urllib.request
def load_url(url, lock):
url_handle = urllib.request.urlopen(url)
url_data = url_handle.read()
# The data returned by read() call is in the bytearray format. We need to
# decode the data before we can print it.
url_handle.close()
lock.acquire()
with open("combinedhtml.txt", 'a+') as outfile:
outfile.write(url_data)
lock.release()
if __name__ == '__main__':
urls = ['http://www.w3c.org', 'http://www.google.com', 'http://www.microsoft.com', 'http://www.wikipedia.org']
lock = Lock()
process_pool = []
for url in urls:
url_loader = Process(target=load_url, args=(url, lock,))
process_pool.append(url_loader)
for loader in process_pool:
loader.start()
for loader in process_pool:
loader.join()
print("Exiting…")
In this example, we only added one extra thing. The use of the Lock class from the multiprocessing library. This Lock class is analogous to the Lock class that is found in the threading library and is used to synchronize the actions of multiple processes through the process of acquiring and releasing a lock. When a process accesses a shared resource, it can first acquire a lock before starting its operation. This effectively causes all other processes that may try to access the same shared resource to block until the process that is currently accessing the resource frees up the lock, and hence causing the actions to be synchronized across the processes.
With this, we now have a fair idea about how we can utilize the Python multiprocessing library to achieve the full potential of a multiprocessor system to speed up our application by leveraging the power of concurrency.