Python multithreading solution

Here, we will create a simple stochastic calculation of pi, and then parallelize it using multiprocessing (and multithreading to compare).

import random
def sample(n):
    """Make n trials of points in the square.  Return (n, number_in_circle)
    
    This is our basic function.  By design, it returns everything it\
    needs to compute the final answer: both n (even though it is an input
    argument) and n_inside_circle.  To compute our final answer, all we
    have to do is sum up the n:s and the n_inside_circle:s and do our
    computation"""
    n_inside_circle = 0
    for i in range(n):
        x = random.random()
        y = random.random()
        if x**2 + y**2 < 1.0:
            n_inside_circle += 1
    return n, n_inside_circle
%%timeit
# Do it just for timing
n, n_inside_circle = sample(10**6)
314 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Do the actual calculation (the previous result doesn't get saved)
n, n_inside_circle = sample(10**6)

This is the “calculate answer” phase.

pi = 4.0 * (n_inside_circle / n)
pi
3.140036

Do it in parallel with multiprocessing

This divides the calculation into 10 tasks and runs sample on each of them. Then it re-combines the results.

import multiprocessing.pool
pool = multiprocessing.pool.Pool()
# The default pool makes one process per CPU
%%timeit
# Do it once to time it
results = pool.map(sample, [10**5] * 10)
152 ms ± 437 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = pool.map(sample, [10**5] * 10)
pool.close()
n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi
3.139504

Do it in “parallel” with threads

To compare. This should not be any faster, because the multiple Python functions can not run at the same time in the same process.

threadpool = multiprocessing.pool.ThreadPool()
%%timeit -o
# Do it once to time it
threadpool.map(sample, [10**5] * 10)
288 ms ± 1.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
<TimeitResult : 288 ms ± 1.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>
# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = threadpool.map(sample, [10**5] * 10)
threadpool.close()
n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi
3.142448

Future ideas

You could make a separate calculate function that take a list of results and returns pi. This can be used regardless of if it is done with multiprocessing or without.

Notice the similarity to split-apply-combine or map-reduce which is a specialization of split-apply-combine.