Python multithreading solution

Here, we will create a simple stochastic calculation of pi, and then parallelize it using multiprocessing (and multithreading to compare).

import random

def sample(n):
    """Make n trials of points in the square.  Return (n, number_in_circle)
    
    This is our basic function.  By design, it returns everything it\
    needs to compute the final answer: both n (even though it is an input
    argument) and n_inside_circle.  To compute our final answer, all we
    have to do is sum up the n:s and the n_inside_circle:s and do our
    computation"""
    n_inside_circle = 0
    for i in range(n):
        x = random.random()
        y = random.random()
        if x**2 + y**2 < 1.0:
            n_inside_circle += 1
    return n, n_inside_circle

%%timeit
# Do it just for timing
n, n_inside_circle = sample(10**6)

598 ms ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Do the actual calculation (the previous result doesn't get saved)
n, n_inside_circle = sample(10**6)

This is the “calculate answer” phase.

pi = 4.0 * (n_inside_circle / n)
pi

3.144548

Do it in parallel with multiprocessing

This divides the calculation into 10 tasks and runs sample on each of them. Then it re-combines the results.

import multiprocessing.pool
pool = multiprocessing.pool.Pool()
# The default pool makes one process per CPU

%%timeit
# Do it once to time it
results = pool.map(sample, [10**5] * 10)

320 ms ± 38.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = pool.map(sample, [10**5] * 10)

pool.close()

n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi

3.140768

Do it in “parallel” with threads

To compare. This should not be any faster, because the multiple Python functions can not run at the same time in the same process.

threadpool = multiprocessing.pool.ThreadPool()

%%timeit -o
# Do it once to time it
threadpool.map(sample, [10**5] * 10)

635 ms ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

<TimeitResult : 635 ms ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>

# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = threadpool.map(sample, [10**5] * 10)

threadpool.close()

n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi

3.142388

Future ideas

You could make a separate calculate function that take a list of results and returns pi. This can be used regardless of if it is done with multiprocessing or without.

Notice the similarity to split-apply-combine or map-reduce which is a specialization of split-apply-combine.