Software installation

Conda installation

This course will use Conda as the base installation platform as it can provide both Python and R installations for multiple operating systems.

We recommend using Anaconda, but Miniconda will work as well.


If you install Anaconda through your university’s software center, you might not be able to create environments! At least this is the case with Aalto University on some operating systems. It is safest to install it yourself, as your user.

Installing course environment

All of the required software is included in the course’s environment.yml:

name: dataanalysis
  - conda-forge
  - defaults
  - colorama
  - git
  - gitdb
  - gitpython
  - h5py
  - jupyter
  - jupyterlab
  - jupyterlab-git
  - matplotlib
  - numpy
  - nodejs
  - pandas
  - pip
  - pyarrow==1.0.1
  - pytables
  - pyyaml
  - r-arrow==1.0.1
  - r-cairo
  - r-caret
  - r-e1071
  - r-furrr
  - r-dbi
  - r-hdf5r
  - r-irkernel
  - r-pryr
  - r-randomforest
  - r-recommended
  - r-rsqlite
  - r-svglite
  - r-tidyverse
  - r-vctrs
  - scipy
  - scikit-learn
  - seaborn
  - six
  - smmap
  - sphinx
  - sphinx_rtd_theme
  - sqlalchemy
  - xlrd
  - pip:
    - memory_profiler
    - sphinx-tabs
    - sphinx_rtd_theme_ext_color_contrast

To install the environment, use the following instructions based on your operating system. To learn more about environments in general, see the conda environment docs (but there is more here than you need to know), if you can do the steps below that is enough.

Linux and Mac OSX (terminal)


See the Linux / MacOS video version at (the Windows version from the Anaconda Navigator might work, too).

In terminal where the Anaconda installation is activated, clone the course repository with:

git clone
cd data-analysis-workflows-course

After this, in the course repository, run:

conda env create environment.yml

If you wish to change the environment name from the default (dataanalysis), use:

conda env create -n env_name environment.yml

Then activate the environment (if you didn’t call it dataanalysis, replace it with the name you used):

conda activate dataanalysis

Now you can launch a jupyterlab instance for running the exercises:


Windows (Anaconda Navigator)


See the Windows / Anaconda Navigator video version at

Download the environment file environment.yml somewhere.

Start Anaconda Navigator. From the navigator, go to Environments.

From bottom, click Import. Choose Name - dataanalysis and for Specification File choose the downloaded environment file.

The environment creation process can take a long time, as the environment is quite big.

After installation, in the Anaconda Navigator Home-tab, next to Applications on, switch from base (root) environment to dataanalysis. Now you can launch a jupyterlab instance by clicking on JupyterLab.

If your jupyterlab instance says something like “X needs to be added to the build”, just click Build and continue forward. If the build gives an error later on, just ignore it.

Now press the +-button on the top left and under Console, choose Python 3.

In the console at the botton, type the following:

import git


This will download the course repository to folder data-analysis-workflows-course. You can now go the folder and commence the testing.

Testing your installation

You must activate the Anaconda environment each time you use it. The steps above also activate it.

This workshop requires that you are familiar with Jupyter notebooks and how to run them. In the git repository that you have downloaded for the installation, you will find a notebook called download_datasets.ipynb. Open it and run it. This will download multiple datasets into the subfolder data/. You can try loading some of these datasets to make sure the download went through. Next, open the notebook for the first exercise, you will find it under X_exercises/ch1-X-ex1.ipynb (replace X with python or r). Make sure you are able to fully run the notebook. In case of installation issues, join us in the pre-workshop meeting (you have received details via email).