Library ecosystem

Questions

  • Beyond what we discuss in this course, what is available?

  • How do you decide what to build on for your work?

Objectives

  • Know of some other available packages, but don’t necessarily know how to use them.

In this part, we’ll talk about the broader Python and SciPy ecosystem. It’s all lecture and discussion, no hands-on. Our goal isn’t to teach you about all of these packages, or even that you remember them. Instead, you should see how things broadly fit together and how you can always search for something that serves your purposes.

Python

The base of it all. Python is written in C, and thus has great C interfaces. This contributes to two things:

  • Extending Python by writing your own modules in C.

    • It’s actually common to first have (or write) an analysis package in C or C++, then make the Python interface. Then it can be supported by other languages, too.

  • Embedding Python, where you have another primary application that uses Python under the hood.

These features aren’t exactly unique to Python, but Python does support them very well.

Core numerics

  • numpy - arrays and array math.

  • scipy - software for math, science, and engineering.

Plotting

  • matplotlib - base plotting package, somewhat low level but almost everything builds on it.

  • seaborn - higher level plotting interface; statistical graphics.

  • mayavi - 3D plotting

  • PIL - image manipulation. The original PIL is no longer maintained, the new “Pillow” is a drop-in replacement.

Data analysis and other important core packages

Interactive computing and human interface

  • Interactive computing

    • IPython - nicer interactive interperter

    • Jupyter (notebook, lab, hub, …) - web-based interface to IPython and other languages

  • Testing

    • pytest - automated testing interface

  • Documentation

    • Sphinx - documentation generator (also used for this lesson…)

  • Development environments

    • Spyder - interactive Python development environment.

  • Binder - load any git repository in Jupyter automatically, good for reproducible research

Interfacing with other languages

  • cffi and ctypes - interface to C and compatible libraries

  • cython - easily make C extensions for Python, also interface to C libraries

  • f2py - interface to Fortran code

  • swig - connect to a variety of programming languages.

  • Boost.python - Another Python/C++ interface

Speeding up code and parallelism

  • PyMPI - Message Passing Interface (MPI) in Python for parallelizing jobs.

  • cython - easily make C extensions for Python, also interface to C libraries

  • numba - just in time compiling of functions for speed-up

  • PyPy - Python written in Python so that it can internally optimize more.

  • Dask - distributed array data structure for distributed computation

  • Joblib - easy embarrasingly parallel computing

  • IPyParallel - easy parallel task engine

  • numexpr - Fast evaluation of array expressions by automatically compiling the arithmetic.

Machine learning

If you need some machine learning, you probably already know what you need and this list is short and irrelevant.

Your stuff

Every small project you do contributes a little bit to the Python and SciPy ecosystem. This course has sort of started you on that path, and a CodeRefinery workshop will make sure you have the tools to produce high-quality, reusable code.

How do you know if you should use something?

Do you trust a random package you find online? Especially for your scientific results, which have to be correct. Still, you also can’t build everything yourself, so you have to decide what point to start with.

  • Are there releases? Have they been going on for a while?

  • Are releases installable and handle dependencies well?

  • Is there good documentation, that not just tells how to use it but how it works?

  • Is there automated testing? What’s your evaluation of the risk of undetectable scientific errors?

  • Is there a community, or is it one person? Is it backed by some organization? Does it have a permanent home?

  • Is it is a public hosting site (GitLab, GitHub, Bitbucket, etc) where a community could form?

  • Do others post issues and make contributions? Are these issues dealt with in a timely manner? Can you search past bug reports?

  • Is the software citeable?

See also

Keypoints

  • Almost everything you need can already be found, except your incremental work.

  • When do you build on that other work, and when do you create things yourself?