Concurrency with multithreading
In most modern processor systems, the use of multithreading is commonplace. With CPUs coming with more than one core and technologies such as hyper-threading, which allows a single core to run multiple threads at the same time, application developers do not waste a single chance to exploit the advantages provided by these technologies.
Python as a programming language supports the implementation of multithreading through the use of a threading module that allows developers to exploit thread-level parallelism in the application.
The following example showcases how a simple program can be built using the threading module in Python:
# simple_multithreading.py
import threading
class SimpleThread(threading.Thread):
def __init__(self, exec_target, exec_args):
threading.Thread.__init__(self, target=exec_target, args=exec_args)
def count_printer(counter):
for i in range(counter, 0, -1):
print(i)
count_thread1 = SimpleThread(exec_target=count_printer, exec_args=(5,))
count_thread2 = SimpleThread(exec_target=count_printer, exec_args=(3,))
print("Starting thread 1")
count_thread1.start()
print("Starting thread 2")
count_thread2.start()
count_thread1.join()
count_thread2.join()
print("Exiting")
This example showcases a simple use of Python's object-oriented programming style and threading module, through which we created a simple multithreaded program that prints numbers using more than one thread.
To achieve this, we first imported the Python 3 threading module into our program. Once the module was imported, we defined a new class, named SimpleThread, which inherits from the threading.Thread class.
To create a simple thread-based class, the minimum we have to do is to define an __init__ method for the child class.
In the __init__ method of our SimpleThread class, we ask for two parameters: exec_target, which defines the callable method that our class should run inside a thread; and exec_args, which is a tuple of arguments that will be passed to the target method.
Inside the __init__ method, all we do is call the __init__ method of the Thread class and provide it with the values for the target parameter, which asks for a callable target and the values for the args parameter, which asks for the value of arguments that need to be passed to the target.
Next, we define the count_printer method, which we will use as a target method while creating objects of our SimpleThread class.
Now, we create two objects of the SimpleThread class, representative of the two threads that we want to run. Once these objects have been created, we call the start() method of the Thread class to start the threads.
The join() method waits for the thread to complete execution and exit and then returns, effectively causing our program to wait till both the threads have completed their execution.
Now, let's see what happens when we run our program:
python simple_multithreading.py
Starting thread 1
5
4
3
Starting thread 2
2
3
1
2
1
Exiting
As we can see, it is quite simple to implement a program with multithreading in Python. But simplicity and concurrency are two roads that never intersect.
The previous example was a very simple example of implementing threading where only a loop runs and prints the numbers onscreen. But most real-life applications use threads to do much more than simply printing a natural number series. These programs may involve performing long blocking I/O operations, where they may read a file and then process it or wait on a database to send the information back for a request.
Let's take a look at the following program, which takes multiple JSON files as input and converts them to YAML format, and writes them to a single YAML file on disk for a hypothetical purpose:
# json_to_yaml.py
import threading
import json
import yaml
class JSONConverter(threading.Thread):
def __init__(self, json_file, yaml_file):
threading.Thread.__init__(self)
self.json_file = json_file
self.yaml_file = yaml_file
def run(self):
print("Starting read for {}".format(self.json_file))
self.json_reader = open(self.json_file, 'r')
self.json = json.load(self.json_reader)
self.json_reader.close()
print("Read completed for {}".format(self.json_file))
print("Writing {} to YAML".format(self.json_file))
self.yaml_writer = open(self.yaml_file, 'a+')
yaml.dump(self.json, self.yaml_writer)
self.yaml_writer.close()
print("Conversion completed for {}".format(self.json_file))
files = ['file1.json', 'file2.json', 'file3.json']
conversion_threads = []
for file in files:
converter = JSONConverter(file, 'converted.yaml')
conversion_threads.append(converter)
converter.start()
for cthread in conversion_threads:
cthread.join()
print("Exiting")
Let's just see what happens when we run this program:
python json_to_yaml.py
Starting read for file1.json
Starting read for file2.json
Starting read for file3.json
Read completed for file1.json
Writing file1.json to YAML
Read completed for file2.json
Read completed for file3.json
Writing file2.json to YAML
Writing file3.json to YAML
Conversion completed for file1.json
Conversion completed for file3.json
Conversion completed for file2.json
Exiting
As you can see, we cannot predict the reading and writing order of the program. Just imagine a situation where the Python interpreter thought it was time to switch over control from the thread that was in the middle of the process of writing the contents of file1.json to the thread that was writing the contents of file2.json. In such a scenario, we would have been left with a corrupt converted.yaml output file that would have contents intermingled from different JSON files. This is just one example of how multithreading can wreak havoc if proper care has not been taken when implementing the program.
Now, the question is, how can we avoid such scenarios?