Python for Scientific Computing

Attending the course 5-7 November, 2024?

See the course page here and watch at https://twitch.tv/coderefinery. Whether you are or aren’t, the course material is below. Videos will appear in this playlist (Last year’s videos: playlist).

Python is a modern, object-oriented programming language, which has become popular in several areas of software development. This course discusses how Python can be utilized in scientific computing. The course starts by introducing some of the main Python tools for computing: Jupyter for interactive analysis, NumPy and SciPy for numerical analysis, Matplotlib for visualization, and so on. In addition, it talks about how python is used: related scientific libraries, reproducibility, and the broader ecosystem of science in Python, because your work is more than the raw code you write.

This course (like any course) can’t teach you Python… it can show your some examples, let you see how experts do things, and prepare you to learn yourself as you need to.

Prerequisites

These are not prerequisites:

  • Any external libraries, e.g. numpy

  • Knowing how to make scripts or use Jupyter

Videos and archived Q&A

Videos and material from past instances:

(prereq)

Introduction to Python

30 min

Jupyter

60 min

NumPy or Advanced NumPy

60 min

Pandas

30 min

Xarray

60 min

Plotting with Matplotlib

60 min

Plotting with Vega-Altair

30 min

Working with Data

60 min

Scripts

40 min

Profiling

20 min

Productivity tools

30 min

Web APIs with Python

15 min

SciPy

30 min

Library ecosystem

45 min

Parallel programming

45 min

Dependency management

30 min

Binder

60 min

Packaging

Who is the course for?

The course is targeted towards these learner personas:

  • A is a early career PhD researcher who has been using Python a bit, but is not sure what they know or don’t know. They want to be able to do their research more efficiently and make sure that they are using the right tools. A may know that numpy exists, etc. and could theoretically read some about it themselves, but aren’t sure if they are going in the right direction.

  • A2 can use numpy and pandas, but have learned little bits here and there and hasn’t had a comprehensive introduction. They want to ensure they are using best practices. (Baseline of high-level packages)

  • B is a mid-to-late undergraduate student who has used Python in some classes. They have possibly learned the syntax and enough to use it in courses, but in a course-like manner where they are expected to create everything themselves: they want to know how to reuse tools that already exist.

Motivation

Why Python

Python has become popular, largely due to good reasons. It’s very easy to get started, there’s lots of educational material, a huge amount of libraries for doing everything imaginable. Particularly in the scientific computing space, there is the Numpy, Scipy, and matplotlib libraries which form the basis of almost everything. Numpy and Scipy are excellent examples of using Python as a glue language, meaning to glue together battle-tested and well performing code and present them with an easy to use interface. Also machine learning and deep learning frameworks have embraced python as the glue language of choice. And finally, Python is open source, meaning that anybody can download and install it on their computer, without having to bother with acquiring a license or such. This makes it easier to distribute your code e.g. to collaborators in different universities.

Why not Python for Scientific Computing

While Python is extremely popular in scientific computing today, there are certainly things better left to other tools.

  • Implementing performance-critical kernels. Python is a very slow language, which often doesn’t matter if you can offload the heavy lifting to fast compiled code, e.g. by using Numpy array operations. But if what you’re trying to do isn’t vectorizable then you’re out of luck. An alternative to Python, albeit much less mature and with a smaller ecosystem, but which provides very fast generated code, is Julia.

  • Creating libraries that can be called from other languages. In this case you’ll often want to create a library with a C interface, which can then be called from most languages. Suitable languages for this sort of task, depending on what you are doing, could be Rust, C, C++, or Fortran.

  • You really like static typing, or functional programming approaches. Haskell might be what you’re looking for.

Python 2 vs Python 3

Python 3.0 came out in September 2008 and was just slightly different enough that most code had to be changed, which meant that many projects ignored it for many years. It was about 3-5 years until the differences were reduced enough (and better transition plans came out, so that it was reasonable to use a single code for both versions) that it become more and more adopted in the scientific community. Python 2 finally became unsupported in 2020, and by now Python 3 is the defacto standard.

At this point, all new projects should use Python 3, and existing actively developed projects should be upgraded to use it. Still, you might find some old unmaintained tools that are only compatible with Python 2.

Credits

This course was originally designed by Janne Blomqvist.

In 2020 it was completely redesigned by a team of the following:

  • Authors: Radovan Bast, Richard Darst, Anne Fouilloux, Thor Wikfeldt, …

  • Editor:

  • Testers and advisors: Enrico Glerean

We follow The Carpentries Code of Conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html

See also