Logo

Unlocking Parallel Computing in Python with Multiprocessing: A Practical Guide

Default Alt Text

Table of Contents

Understanding the need for parallel computing in Python

Python’s GIL (Global Interpreter Lock) ensures that only one thread executes Python bytecodes at a time in a single process. Due to this, multithreading often does not lead to performance improvement but instead, it can degrade performance. Hence, to utilize multiple cores and achieve true parallelism in Python, we use multiprocessing. Let’s compare the runtime when a task is performed sequentially and when it is performed in parallel.

Here is a simple example of calculating the square of numbers from 1 to 10,000. First, we will do it in a sequential manner and then in parallel using multiprocessing, and finally compare the time taken in each case.

import time
from multiprocessing import Pool


def square(n):
    return n * n

if __name__ == "__main__":
    # creating a list of numbers from 1 to 10,000
    numbers = list(range(1, 10001))

    # Step1: Sequential processing
    start_time = time.time()
    results = list(map(square, numbers))
    end_time = time.time()
    print(f"Sequential processing time: {end_time - start_time} seconds.")

    # Step2: Parallel processing using multiprocessing
    start_time = time.time()
    with Pool() as p:
        results = p.map(square, numbers)
    end_time = time.time()
    print(f"Parallel processing time: {end_time - start_time} seconds.")

In this program, the time module is used to record the time before and after the mapping operation. The map function applies the square function to every item of the numbers list. In sequential processing, this mapping operation is performed by the main program. However, in parallel processing, this operation is shared among different processes created by the multiprocessing Pool. If you run this code on a machine with more than one core, you’ll find that parallel processing with multiprocessing is much faster.

Python’s Multiprocessing library

With an ever-growing need for performance, Python’s `multiprocessing` module is a powerful tool that enables you to create processes, and thus, facilitating concurrent execution of code. The `multiprocessing` module comes bundled in with Python so there is no need to install it separately. This module bypasses the Global Interpreter Lock by using subprocesses instead of threads and thus, allows you to achieve true parallelism in your Python program. Below is an example of how you can import and use this module.

import multiprocessing


def task(num):
    print(f'Processing {num}')

if __name__ == "__main__":
    for i in range(5):  # Number of tasks you want to run in parallel
        process = multiprocessing.Process(target=task, args=(i,))
        process.start()

In the above Python example, we start by importing the `multiprocessing` module. We then define a task to execute, which in this case, is a simple printing function. If you are planning to use code similar to this, you can replace the print statement with any task you want to run in parallel.

We then use a for loop to create separate processes for each task using `multiprocessing.Process` function where we specify the target function and its arguments. Finally, we start each process using the `start` method.

This is a very basic demonstration of Python’s multiprocessing module. It is a powerful library that not just lets you create and manage processes, but also allows process synchronization, sharing state between processes, and much more. However, remember that the effective use of multiprocessing in Python requires a careful and well-thought approach as it can also lead to issues such as increased memory usage if not managed well.

Getting Started with Multiprocessing in Python

Installing the Multiprocessing library

Before we can begin exploring the capabilities of Python’s Multiprocessing library, we need to ensure it’s properly installed on our system. Although Python’s Multiprocessing module is included in the standard library, if you’re missing it for some reason, you can install or update it using Python’s package installer, pip. Here’s how to do this in a Unix-like command line terminal:

pip install multiprocessing

or, if you want to upgrade an existing installation,

pip install --upgrade multiprocessing

Please remember, you may need to use ‘pip3’ instead of ‘pip’ if you’re using Python 3 or if you have both Python 2 and 3 installed on your system.

To conclude, the pip command above helps you install or upgrade the multiprocessing library, providing you with access to utilize its functionalities. Now that we have the necessary setup in place, we can dive into practical implementations of multiprocessing.

Importing the required modules

In this section, we will see how to import the necessary libraries to leverage the multiprocessing features in Python. The primary library we are going to use is Python’s built-in ‘multiprocessing’ library. We’ll also be using ‘os’ to interact with the operating system and ‘time’ to create delays for demonstration purposes.

import os
import multiprocessing
import time

The python ‘multiprocessing’ library allows for the creation, synchronization and communication between processes, hence considered as the key to perform parallel computing in Python. The ‘os’ library provides functions to interact with the operating system and allows us to fetch or set the process id, etc.

The ‘time’ library allows us to make the program “wait” for certain duration using the ‘sleep’ function. This can be especially useful in scenarios where we want to simulate delays that may naturally occur in more intensive computations.

By importing these libraries, we can leverage multiprocessing techniques throughout the rest of our application, enabling us to efficiently utilize available resources and improve performance where necessary.

Key Components of Multiprocessing

Overview of key Multiprocessing components

In Python’s multiprocessing module, Processes are the most basic components and each Process object represents an individual process. A Pool object controls a pool of worker processes and it’s great for parallel execution of a function across multiple input values. Queues are used to let the different processes communicate with each other and allow you to share data between processes. Shared memory is another crucial component that allows you to share data between processes.

Creating a new process

The ‘multiprocessing’ module in Python is a means of creating a new process. First, we import the required module, then we define the function that we want to run in parallel, and finally, we manage the processes. Remember, each Python multiprocessing process gets its own Python interpreter and distinct memory space.

from multiprocessing import Process


def print_func(continent='Asia'):
   print('The name of continent is : ', continent)

if __name__ == "__main__":
   # names of continents
   names = ['America', 'Europe', 'Africa']

   # creating processes
   procs = []

   # creating process with arguments
   for name in names:
       # create a new process with target function and arguments
       proc = Process(target=print_func, args=(name,))
       # add the process to our process list
       procs.append(proc)
       proc.start()

   # complete the processes
   for proc in procs:
       proc.join()

In the code above, first, we defined a function `print_func()` that prints out the names of continents. In the `if __name__ == “__main__”:` section, we ensured the code gets executed only when the script is run directly (not during import).

Next, we created an empty list `procs` to hold our process objects, then we looped through the list of continent names and created a new process for each. `Process()` takes two arguments – the target function and the function arguments in a tuple. We start each process using `proc.start()` and finally, we wait for all processes to complete with `proc.join()`.

This demonstrates creating and executing basic new processes using the `multiprocessing.Process` library in Python. As a result, each process might run on a different CPU in your computer, allowing tasks to run in parallel, utilizing your computational resources effectively.

Python multiprocessing pools

To demonstrate the practical application of the ‘Pool’ class from the multiprocessing module, we will be using a simple example. In this example, we will be creating a pool of worker processes that will calculate the square of a number.

In Python, creating a pool of worker processes is as simple as instantiating an object of the ‘Pool’ class. This pool object then allows tasks to be offloaded to the worker processes in the pool.

Here is a simple Python script that demonstrates this concept:

from multiprocessing import Pool

def square(n):
    return n*n

if __name__ == "__main__":
    # create a pool of 4 worker processes
    with Pool(processes=4) as pool:
        # map the square function to the numbers 1 through 10 
        result = pool.map(square, range(1, 11))

    print(f"Results: {result}")

In this script, we first import the `Pool` class from the `multiprocessing` module. We then define a simple function `square` that calculates the square of a number.

In the main section of the script, we create a pool of 4 worker processes using `with Pool(processes=4) as pool:`. The advantage of using `with` is that it makes sure the pool is properly closed, no matter what happens in the program.

We then use the `map` function of the Pool object to offload the task of squaring numbers from 1 through 10 to the worker processes in the pool. The `map` function essentially maps the `square` function to every number in the range(1, 11).

Finally, we print the result. The output should be a list showing the squares of numbers from 1 through 10.

This example illustrates the power and simplicity of using a pool of worker processes for parallel processing in Python. By offloading tasks to a pool of worker processes, we can make optimal use of multiple CPU cores and improve the performance of our Python programs.

Working with queues and shared memory

Python’s Multiprocessing library also allows us to create queues and shared memory between processes allowing efficient exchange and storage of data. Let’s explore this with an example code:

from multiprocessing import Process, Queue, Value, Array

def f(n, a, q):
    n.value = 3.1415927
    for i in range(len(a)):
        a[i] = -a[i]
    q.put([42, None, 'hello'])

if __name__ == '__main__':
    num = Value('d', 0.0)
    arr = Array('i', range(10))
    q = Queue()

    p = Process(target=f, args=(num, arr, q))
    p.start()
    p.join()

    print(num.value)
    print(arr[:])
    print(q.get())    # prints "[42, None, 'hello']"

This code starts a new process which modifies the number and array stored in shared memory and pushes another array in the queue. A `multiprocessing.Value` is used for sharing a scalar object while `multiprocessing.Array` is used for sharing a buffer of data (similar to an array). Queues are thread and process-safe data structures that allow you to safely pass data between processes.

After running the process, the main process prints the modified values of shared memory and dequeues from the queue showing an efficient way of sharing data between processes.

Advanced Concepts in Multiprocessing

Understanding process synchronization in Multiprocessing

The Multiprocessing library allows for spawned processes which in turn can have its own Python interpreter and memory space. This is beneficial over threading as in the latter, multiple threads share the same memory space. However, when dealing with multiple processes, specific mechanisms need to be in place to manage concurrency of the operations. This is where Locks and Semaphores come into play.

A Lock is one of the simplest synchronization primitives available in Python or in any programming language. It is in one state – locked or unlocked. It has two basic methods: `acquire()` and `release()`.

A Semaphore is an advanced version of a lock that can be used to limit access to changeable resources. It manages an internal counter which is decremented by each `acquire()` call and incremented by each `release()` call.

The below Python code demonstrates the use of Locks and Semaphores in a multiprocessing environment.

import multiprocessing


def printer(item, lock):
    """
    Prints out the item that was passed in
    """
    lock.acquire()
    try:
        print(item)
    finally:
        lock.release()

if __name__ == '__main__':
    lock = multiprocessing.Lock()
    items = ['tango', 'foxtrot', 10]
    for item in items:
        p = multiprocessing.Process(target=printer, args=(item, lock))
        p.start()

In the above code snippet, the function `printer()` prints an item. A Lock is created and used to ensure the smooth printing of items, even in a multiprocess environment.

However, in scenarios where we need to limit access to one or more resources, we can use Semaphores. Here is how you can do it in Python:

import multiprocessing
import time


def worker(s, i):
    s.acquire()
    print(multiprocessing.current_process().name + " acquired")
    time.sleep(i)
    print(multiprocessing.current_process().name + " released\n")
    s.release()

if __name__ == "__main__":
    s = multiprocessing.Semaphore(2)
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(s, 2))
        p.start()

Here a Semaphore is initialized with a count of 2. This means, at any point, only 2 worker processes can access the resource. To observe this, we also added a delay in the worker process and printed out when a worker acquires and releases the semaphore.

Using Locks and Semaphores, you can manage concurrency and avoid potential race conditions or deadlocks in your multiprocessing Python application.

Handling process communication

Interprocess communication is fundamental to Python’s multiprocessing framework. Communication between different processes can be achieved using pipes and queues. Here’s how to implement this:

from multiprocessing import Process, Queue

def worker(q):
   message = q.get()
   print(f"Received {message} from main process")

def main():
   q = Queue()
   p = Process(target=worker, args=(q,))
   p.start()
   q.put("Hello Worker Process!")
   p.join()

if __name__ == "__main__":
   main()

This example starts by importing the modules we need from multiprocessing: Process and Queue. The worker function, which will be run in the new process, accepts a Queue object as input. The main function starts a new process and the worker function runs in that process. It then puts a message into the queue, which is subsequently received and printed by the worker process.

In summary, pipes and queues provide valuable mechanisms to perform interprocess communication. Carefully examine your needs to determine which mechanism best serves your multiprocessing requirements. Remember the two golden rules, though: queues are best for multiple producers and consumers, and pipes work well for one-to-one communication scenarios. Keep tinkering and happy coding!

Exception handling and timeouts

In this part, we will see how to handle exceptions and implement timeouts in Python’s multiprocessing library. This strategy avoids long delays or unwanted termination of processes.

Here is how you can accomplish that:

from multiprocessing import Pool
import time

def f(x):
    if x == 5:
        raise Exception('An error has occurred!')
    return x*x

def worker_main(x):
    try:
        result = f(x)
    except Exception as e:
        print('Caught exception: ', e)
        result = None
    return result

if __name__ == '__main__':
    with Pool(processes=4) as pool:
        for i in range(10):
            try:
                result = pool.apply_async(worker_main, args=(i,))
                print(result.get(timeout=1))
            except Exception as e:
                print('Caught exception: ', e)
        pool.close()
        pool.join()

The above code spawns a pool of worker processes and applies the function `f(x)` to each item in the input range. If an exception occurs in the worker function (as it will when `x` equals 5), the exception is caught and handled gracefully rather than causing the program to crash. Function `get()` with `timeout=1` will throw an Exception if the result isn’t available after 1 second.

In conclusion, handling exceptions in multiprocessing allows for smooth execution and better error management in parallel computation programs. It is also crucial to implement timeouts when waiting for the function to return a result to prevent the entire application from freezing due to a hung process.

Performance Considerations and Potential Pitfalls

Analyzing performance impact

To analyze the performance of a multiprocessing Python script, you would typically time how long your algorithm takes to run. We’ll do this by comparing a single-processor run to a multi-processor run of an example function that simulates a long running task.

import time
import multiprocessing


def task(n):
    sum([i**2 for i in range(n)])

if __name__ == "__main__":
    # Single processor
    start_time = time.time()
    task(10000000)
    single_time = time.time() - start_time
    
    # Multi processor
    start_time = time.time()
    p = multiprocessing.Process(target=task, args=(10000000,))
    p.start()
    p.join()
    multi_time = time.time() - start_time
    
    print("Single processor time: ", single_time)
    print("Multi processor time: ",  multi_time)

In this script, we define a simple task function. This function calculates the sum of squares for a range of numbers, simulating a long running process. We then run this function twice, once using a single processor and once using multiple processors through multiprocessing.

With the `time.time()` function, we record the start time before the task and then calculate the time taken after the task has completed.

Finally, we output the time taken for each scenario. Note that the times outputted are in seconds. By comparing these times, you can see the performance increase provided by multiprocessing.

Keep in mind, because of the overhead associated with multiprocessing (i.e., setting up the different processes), multiprocessing may not always speed up smaller tasks, and in some cases may even slow them down. However, for larger tasks or more complex algorithms, the performance increase can be significant.

Conclusion

Throughout this practical guide, we have explored the power of parallel computing in Python using the Multiprocessing library. We’ve learned how to create and manage processes, pools, queues, and shared memory, as well as implement more advanced techniques such as process synchronization and inter-process communication. Furthermore, we’ve delved into important considerations such as performance analysis, exception handling, and potential pitfalls. As the computational world evolves, parallel computing and multiprocessing will continue to be critical in maximizing resource utilization. Hence, mastering these skills will be instrumental in creating efficient Python programs capable of handling complex, resource-intensive tasks. So, keep experimenting, keep refining, and harness the full potential of Python’s multiprocessing capabilities.

Share This Post

More To Explore

Default Alt Text
AWS

Integrating Python with AWS DynamoDB for NoSQL Database Solutions

This blog provides a comprehensive guide on leveraging Python for interaction with AWS DynamoDB to manage NoSQL databases. It offers a step-by-step approach to installation, configuration, database operation such as data insertion, retrieval, update, and deletion using Python’s SDK Boto3.

Default Alt Text
Computer Vision

Automated Image Enhancement with Python: Libraries and Techniques

Explore the power of Python’s key libraries like Pillow, OpenCV, and SciKit Image for automated image enhancement. Dive into vital techniques such as histogram equalization, image segmentation, and noise reduction, all demonstrated through detailed case studies.

Do You Want To Boost Your Business?

drop us a line and keep in touch