Tech stuff
This list is made for new people, who have been hired as RSEs at Aalto. This isn’t a list of what someone has to know to apply, and not what people should know before starting. It only provides a map, a RSE will incrementally learn things here (and probably things not on the list - this list is what we already know). In the future, we expect this list to be copied and reused in other contexts.
Someone might take ~6 months to slowly learn things on this list as they need them. Not everything will be needed.
Linux and shell
Shell scripting and the OS interface
Advanced Bash SCripting guide: https://tldp.org/LDP/abs/html/
Containers
Docker: https://docs.docker.com/get-started/overview/ (though we don’t Docker on the cluster, it’s good to know anyway)
Dockerfile reference: https://docs.docker.com/engine/reference/builder/
Apptainer (formerly called “singularity”):
This is what is actually used on the cluster
Key things: make own container, convert docker to apptainer, how to run on cluster
singularity_wrapper and how it works with Lmod: https://scicomp.aalto.fi/triton/usage/singularity/ (load a singularity module and you can see where the script is)
Lmod:
(hint: personal modulefiles: mkdir ~/modulefiles ; module use ~/modulefiles)
Mainly basic use + writing modulefiles
Conda
especially resolving GPU code related issues
Software development tools
CodeRefinery lessons (https://coderefinery.org)
git-intro: https://coderefinery.github.io/git-intro/
and git-collaborative: https://coderefinery.github.io/git-collaborative/
Reproducible Research: https://coderefinery.github.io/reproducible-research/
Documentation: https://coderefinery.github.io/documentation/
(Jupyter: https://coderefinery.github.io/jupyter/)
Automated Testing https://coderefinery.github.io/testing/
Modular type-along or Modular code developmenent presentation
Social coding: https://coderefinery.github.io/social-coding/
There are a few other interesting CodeRefinery lessons: https://coderefinery.org/lessons/
HPC
Triton tutorials is what we expect our users to know, and reading through these is enough (it will be familiar): https://scicomp.aalto.fi/triton/#tutorials
And in general, browse (but not read in detail) the rest of the Triton https://scicomp.aalto.fi/triton/
Programming
Python
Python virtual environments and Conda environments: https://scicomp.aalto.fi/scicomp/python/ , https://scicomp.aalto.fi/triton/apps/python/
Be able to create a virtual environment
Python module/package structure
Python packaging
https://packaging.python.org/en/latest/tutorials/packaging-projects/
setup.py vs pyproject.toml (newer)
Python command line interfaces (argparse), installing interfaces via packages, …
Other steps for a good project
Good project structure (module-name/module_name/)
Command line interface
Modular and maintainable code
Installable: setup.py vs pyproject.toml
Linter (if worth it)
Test coverage (if worth it)
Good documentation (README, code-level docs, sphinx + RTD/gh-pages)
Automated tests to the degree useful for the project. At least minimal.
Github Actions
PyPI release
conda-forge release
GH-action for releasing to PyPI/conda-forge
Examples: *
Data processing
webdataset
Small file management in various ways
Exercise: i/o benchmarking
Data management
FAIR data
Open Science
Aalto Data Agents webinars
Web stuff
intro, debugging
django?