Python multithreading solution

Here, we will create a simple stochastic calculation of pi, and then parallelize it using multiprocessing (and multithreading to compare).

import random
def sample(n):
    """Make n trials of points in the square.  Return (n, number_in_circle)
    
    This is our basic function.  By design, it returns everything it\
    needs to compute the final answer: both n (even though it is an input
    argument) and n_inside_circle.  To compute our final answer, all we
    have to do is sum up the n:s and the n_inside_circle:s and do our
    computation"""
    n_inside_circle = 0
    for i in range(n):
        x = random.random()
        y = random.random()
        if x**2 + y**2 < 1.0:
            n_inside_circle += 1
    return n, n_inside_circle
%%timeit
# Do it just for timing
n, n_inside_circle = sample(10**6)
598 ms ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Do the actual calculation (the previous result doesn't get saved)
n, n_inside_circle = sample(10**6)

This is the “calculate answer” phase.

pi = 4.0 * (n_inside_circle / n)
pi
3.144548

Do it in parallel with multiprocessing

This divides the calculation into 10 tasks and runs sample on each of them. Then it re-combines the results.

import multiprocessing.pool
pool = multiprocessing.pool.Pool()
# The default pool makes one process per CPU
%%timeit
# Do it once to time it
results = pool.map(sample, [10**5] * 10)
320 ms ± 38.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = pool.map(sample, [10**5] * 10)
pool.close()
n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi
3.140768

Do it in “parallel” with threads

To compare. This should not be any faster, because the multiple Python functions can not run at the same time in the same process.

threadpool = multiprocessing.pool.ThreadPool()
%%timeit -o
# Do it once to time it
threadpool.map(sample, [10**5] * 10)
635 ms ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
<TimeitResult : 635 ms ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>
# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = threadpool.map(sample, [10**5] * 10)
threadpool.close()
n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi
3.142388

Future ideas

You could make a separate calculate function that take a list of results and returns pi. This can be used regardless of if it is done with multiprocessing or without.

Notice the similarity to split-apply-combine or map-reduce which is a specialization of split-apply-combine.