Python multithreading solution
Here, we will create a simple stochastic calculation of pi, and then parallelize it using multiprocessing (and multithreading to compare).
import random
def sample(n):
"""Make n trials of points in the square. Return (n, number_in_circle)
This is our basic function. By design, it returns everything it\
needs to compute the final answer: both n (even though it is an input
argument) and n_inside_circle. To compute our final answer, all we
have to do is sum up the n:s and the n_inside_circle:s and do our
computation"""
n_inside_circle = 0
for i in range(n):
x = random.random()
y = random.random()
if x**2 + y**2 < 1.0:
n_inside_circle += 1
return n, n_inside_circle
%%timeit
# Do it just for timing
n, n_inside_circle = sample(10**6)
598 ms ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Do the actual calculation (the previous result doesn't get saved)
n, n_inside_circle = sample(10**6)
This is the “calculate answer” phase.
pi = 4.0 * (n_inside_circle / n)
pi
3.144548
Do it in parallel with multiprocessing
This divides the calculation into 10 tasks and runs sample
on each of them. Then it re-combines the results.
import multiprocessing.pool
pool = multiprocessing.pool.Pool()
# The default pool makes one process per CPU
%%timeit
# Do it once to time it
results = pool.map(sample, [10**5] * 10)
320 ms ± 38.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = pool.map(sample, [10**5] * 10)
pool.close()
n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi
3.140768
Do it in “parallel” with threads
To compare. This should not be any faster, because the multiple Python functions can not run at the same time in the same process.
threadpool = multiprocessing.pool.ThreadPool()
%%timeit -o
# Do it once to time it
threadpool.map(sample, [10**5] * 10)
635 ms ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
<TimeitResult : 635 ms ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>
# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = threadpool.map(sample, [10**5] * 10)
threadpool.close()
n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi
3.142388
Future ideas
You could make a separate calculate
function that take a list of results and returns pi. This can be used regardless of if it is done with multiprocessing or without.
Notice the similarity to split-apply-combine or map-reduce which is a specialization of split-apply-combine.