This course will use Conda as the base installation platform as it can provide both Python and R installations for multiple operating systems.
We recommend using Anaconda, but Miniconda will work as well.
If you install Anaconda through your university’s software center, you might not be able to create environments! At least this is the case with Aalto University on some operating systems. It is safest to install it yourself, as your user.
Installing course environment¶
All of the required software is included in the course’s environment.yml:
name: dataanalysis channels: - conda-forge - defaults dependencies: - colorama - git - gitdb - gitpython - h5py - jupyter - jupyterlab - jupyterlab-git - matplotlib - numpy - nodejs - pandas - pip - pyarrow==1.0.1 - pytables - pyyaml - r-arrow==1.0.1 - r-cairo - r-caret - r-e1071 - r-furrr - r-dbi - r-hdf5r - r-irkernel - r-pryr - r-randomforest - r-recommended - r-rsqlite - r-svglite - r-tidyverse - r-vctrs - scipy - scikit-learn - seaborn - six - smmap - sphinx - sphinx_rtd_theme - sqlalchemy - xlrd - pip: - memory_profiler - sphinx-tabs - sphinx_rtd_theme_ext_color_contrast - https://github.com/coderefinery/sphinx-lesson/archive/master.zip
To install the environment, use the following instructions based on your operating system. To learn more about environments in general, see the conda environment docs (but there is more here than you need to know), if you can do the steps below that is enough.
Linux and Mac OSX (terminal)¶
See the Linux / MacOS video version at https://youtu.be/mkkJnkouZ2o (the Windows version from the Anaconda Navigator might work, too).
In terminal where the Anaconda installation is activated, clone the course repository with:
git clone https://github.com/AaltoSciComp/data-analysis-workflows-course.git cd data-analysis-workflows-course
After this, in the course repository, run:
conda env create environment.yml
If you wish to change the environment name from the default (
conda env create -n env_name environment.yml
Then activate the environment (if you didn’t call it
replace it with the name you used):
conda activate dataanalysis
Now you can launch a jupyterlab instance for running the exercises:
Testing your installation¶
You must activate the Anaconda environment each time you use it. The steps above also activate it.
This workshop requires that you are familiar with Jupyter notebooks and how to run
them. In the git repository that you have downloaded for the installation, you will
find a notebook called
download_datasets.ipynb. Open it and run it. This will
download multiple datasets into the subfolder
data/. You can try loading some of
these datasets to make sure the download went through. Next, open the notebook for
the first exercise, you will find it under
r). Make sure you are able to fully run the
notebook. In case of installation issues, join us in the pre-workshop meeting
(you have received details via email).