Multiprocessing for Parallelism in Python

in

If you’re like me and have been working on some intense data analysis projects lately, you know how frustrating it can be to wait for your code to finish running. That’s where multiprocessing comes in a magical tool that allows us to split our work into smaller tasks and run them simultaneously, making our programs faster and more efficient than ever before!

But let’s not get too carried away with the technical jargon just yet. Instead, let’s dive right into some examples of how we can use multiprocessing to make our code sing like a choir of unicorns!

First installing the `multiprocessing` library in Python:

# Importing the necessary library for multiprocessing
import multiprocessing

# Defining a function to be executed by each process
def process_func():
    # Printing a message to show that the function is being executed
    print("Executing process function")

# Checking if the current script is the main script
if __name__ == "__main__":
    # Creating a list to store the processes
    processes = []

    # Looping 5 times to create 5 processes
    for i in range(5):
        # Creating a process and passing the function to be executed
        p = multiprocessing.Process(target=process_func)
        # Starting the process
        p.start()
        # Appending the process to the list
        processes.append(p)

    # Looping through the list of processes
    for p in processes:
        # Joining the processes to ensure they finish before moving on
        p.join()

# Output:
# Executing process function
# Executing process function
# Executing process function
# Executing process function
# Executing process function

# Explanation:
# - The "multiprocessing" library is imported to enable the use of multiprocessing in the script.
# - A function named "process_func" is defined to be executed by each process.
# - The "if __name__ == "__main__":" statement checks if the current script is the main script.
# - A list named "processes" is created to store the processes.
# - A for loop is used to create 5 processes, each with the "process_func" function as the target.
# - The processes are started and appended to the "processes" list.
# - Another for loop is used to join the processes, ensuring they finish before moving on.
# - The output shows that the "process_func" function was executed 5 times by the 5 processes.

Now that we have it installed, let’s create a simple function to calculate the sum of all numbers between 1 and 100. This is going to be our base case for demonstrating how multiprocessing works:

# Import necessary libraries
import time # Import the time library to measure execution time
from multiprocessing import Pool # Import the Pool class from the multiprocessing library to enable parallel processing

# Define a function to calculate the sum of numbers between a given range
def add_numbers(start, end):
    total = 0 # Initialize a variable to store the sum
    for num in range(start, end+1): # Loop through the range of numbers
        total += num # Add each number to the total
    return total # Return the final sum

if __name__ == '__main__':
    start = time.time() # Start measuring execution time
    
    # Define the number of processes we want to run simultaneously (in this case, 4)
    pool_size = 4
    
    # Create a Pool object with our desired size and map our function over it using the `map` method
    with Pool(processes=pool_size) as p: # Create a Pool object with 4 processes
        results = list(p.imap(add_numbers, [(1, 25), (26, 50), (51, 75), (76, 100)])) # Use the `imap` method to map the add_numbers function over the given ranges and store the results in a list
    
    # Collect the results and print them out
    for result in results: # Loop through the results
        print("Result:", result) # Print each result
        
    end = time.time() # Stop measuring execution time
    total_time = end - start # Calculate the total execution time
    print(f"Total execution time: {total_time} seconds") # Print the total execution time in seconds

In this example, we’re using the `Pool` object to create four processes that will run simultaneously and calculate the sum of numbers between 1 and 25, 26 and 50, 51 and 75, and finally, 76 and 100. By splitting our work into smaller tasks like this, we’re able to take advantage of multiprocessing and make our code run faster than ever before!

But wait there’s more! Let’s say you have a function that takes in an array as input and performs some complex calculations on it. Instead of running the entire function for each element in the array, we can use `Pool` to split up the work into smaller tasks and run them simultaneously:



# Import necessary libraries
import time # Import the time library to measure execution time
from multiprocessing import Pool # Import the Pool class from the multiprocessing library to enable parallel processing

# Define a function that takes in an array as input and calculates the sum of squares of all elements in the array
def calculate_sum(arr):
    total = 0 # Initialize a variable to store the sum
    for num in arr: # Loop through each element in the array
        total += num ** 2 # Square the element and add it to the total
    return total # Return the final sum

if __name__ == '__main__':
    start = time.time() # Record the start time of the script
    
    # Define the number of processes we want to run simultaneously (in this case, 4)
    pool_size = 4
    
    # Create a Pool object with our desired size and map our function over it using the `map` method
    arr = [1, 2, 3, 4, 5] * 100000 # Create a large array with 500,000 elements
    with Pool(processes=pool_size) as p: # Create a Pool object with 4 processes
        # Use the `imap` method to split the array into smaller chunks and run the function on each chunk simultaneously
        results = list(p.imap(calculate_sum, [arr[i:i+1000] for i in range(0, len(arr), 1000)]))
    
    # Collect the results and print them out
    for result in results:
        print("Result:", result)
        
    end = time.time() # Record the end time of the script
    total_time = end - start # Calculate the total execution time
    print(f"Total execution time: {total_time} seconds") # Print the total execution time in seconds

In this example, we’re using `Pool` to split up our array into smaller chunks of 1000 elements each and calculate the sum of squares for each chunk simultaneously. By doing so, we’re able to take advantage of multiprocessing and make our code run faster than ever before!

With this tool at your disposal, you can split up your work into smaller tasks and run them simultaneously, making your programs faster and more efficient than ever before!

SICORPS