Here, we will create a simple stochastic calculation of pi, and then parallelize it using multiprocessing (and multithreading to compare).

```import random
```
```def sample(n):
"""Make n trials of points in the square.  Return (n, number_in_circle)

This is our basic function.  By design, it returns everything it\
needs to compute the final answer: both n (even though it is an input
argument) and n_inside_circle.  To compute our final answer, all we
have to do is sum up the n:s and the n_inside_circle:s and do our
computation"""
n_inside_circle = 0
for i in range(n):
x = random.random()
y = random.random()
if x**2 + y**2 < 1.0:
n_inside_circle += 1
return n, n_inside_circle
```
```%%timeit
# Do it just for timing
n, n_inside_circle = sample(10**6)
```
```314 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
```# Do the actual calculation (the previous result doesn't get saved)
n, n_inside_circle = sample(10**6)
```

This is the “calculate answer” phase.

```pi = 4.0 * (n_inside_circle / n)
pi
```
```3.140036
```

## Do it in parallel with multiprocessing

This divides the calculation into 10 tasks and runs `sample` on each of them. Then it re-combines the results.

```import multiprocessing.pool
pool = multiprocessing.pool.Pool()
# The default pool makes one process per CPU
```
```%%timeit
# Do it once to time it
results = pool.map(sample, [10**5] * 10)
```
```152 ms ± 437 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
```# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = pool.map(sample, [10**5] * 10)
```
```pool.close()
```
```n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi
```
```3.139504
```

## Do it in “parallel” with threads

To compare. This should not be any faster, because the multiple Python functions can not run at the same time in the same process.

```threadpool = multiprocessing.pool.ThreadPool()
```
```%%timeit -o
# Do it once to time it
```
```288 ms ± 1.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
```<TimeitResult : 288 ms ± 1.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>
```
```# Do it again to get the results, since the results of the above
# cell aren't accessible because of the %%timeit magic.
results = threadpool.map(sample, [10**5] * 10)
```
```threadpool.close()
```
```n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
pi
```
```3.142448
```

## Future ideas

You could make a separate `calculate` function that take a list of results and returns pi. This can be used regardless of if it is done with multiprocessing or without.

Notice the similarity to split-apply-combine or map-reduce which is a specialization of split-apply-combine.