Beyond what we discuss in this course, what is available?
How do you decide what to build on for your work?
Know of some other available packages, but don’t necessarily know how to use them.
In this part, we’ll talk about the broader Python and SciPy ecosystem. It’s all lecture and discussion, no hands-on. Our goal isn’t to teach you about all of these packages, or even that you remember them. Instead, you should see how things broadly fit together and how you can always search for something that serves your purposes.
The base of it all. Python is written in C, and thus has great C interfaces. This contributes to two things:
Extending Python by writing your own modules in C.
It’s actually common to first have (or write) an analysis package in C or C++, then make the Python interface. Then it can be supported by other languages, too.
Embedding Python, where you have another primary application that uses Python under the hood.
These features aren’t exactly unique to Python, but Python does support them very well.
Data analysis and other important core packages¶
Interactive computing and human interface¶
pytest - automated testing interface
Sphinx - documentation generator (also used for this lesson…)
Spyder - interactive Python development environment.
Binder - load any git repository in Jupyter automatically, good for reproducible research
Interfacing with other languages¶
Speeding up code and parallelism¶
PyMPI - Message Passing Interface (MPI) in Python for parallelizing jobs.
cython - easily make C extensions for Python, also interface to C libraries
numba - just in time compiling of functions for speed-up
PyPy - Python written in Python so that it can internally optimize more.
Dask - distributed array data structure for distributed computation
Joblib - easy embarrasingly parallel computing
IPyParallel - easy parallel task engine
numexpr - Fast evaluation of array expressions by automatically compiling the arithmetic.
If you need some machine learning, you probably already know what you need and this list is short and irrelevant.
Every small project you do contributes a little bit to the Python and SciPy ecosystem. This course has sort of started you on that path, and a CodeRefinery workshop will make sure you have the tools to produce high-quality, reusable code.
How do you know if you should use something?¶
Do you trust a random package you find online? Especially for your scientific results, which have to be correct. Still, you also can’t build everything yourself, so you have to decide what point to start with.
Are there releases? Have they been going on for a while?
Are releases installable and handle dependencies well?
Is there good documentation, that not just tells how to use it but how it works?
Is there automated testing? What’s your evaluation of the risk of undetectable scientific errors?
Is there a community, or is it one person? Is it backed by some organization? Does it have a permanent home?
Is it is a public hosting site (GitLab, GitHub, Bitbucket, etc) where a community could form?
Do others post issues and make contributions? Are these issues dealt with in a timely manner? Can you search past bug reports?
Is the software citeable?