Binder

Questions

  • Why sharing code alone may not be sufficient.

  • How to share a computational environment?

  • What is Binder?

  • How to binderize my Python repository?

  • How to publish my Python repository?

Objectives

  • Learn about reproducible computational environments.

  • Learn to create and share custom computing environments with Binder.

  • Learn to get a DOI from Zenodo for a repository.

Why is it sometimes not enough to share your code?

../_images/python_unmasked.jpg

Exercise 1

Binder-1: Discuss better strategies than only code sharing (10 min)

Lea is a PhD student in computational biology and after 2 years of intensive work, she is finally ready to publish her first paper. The code she has used for analyzing her data is available on GitHub but her supervisor who is an advocate of open science told her that sharing code is not sufficient.

Why is it possibly not enough to share “just” your code? What problems can you anticipate 2-5 years from now?

We form small groups (4-5 persons) and discuss in groups. If the workshop is online, each group will join a breakout room. If joining a group is not possible or practical, we use the shared document to discuss this collaboratively.

Each group write a summary (bullet points) of the discussion in the workshop shared document (the link will be provided by your instructors).

Sharing a computing environment with Binder

Binder allows you to create custom computing environments that can be shared and used by many remote users. It uses repo2docker to create a container image (docker image) of a project using information contained in included configuration files.

Repo2docker is a standalone package that you can install locally on your laptop but an online Binder service is freely available. This is what we will be using in the tutorial.

The main objective of this exercise is to learn to fork a repository and add a requirement file to share the computational environment with Binder.

https://opendreamkit.org/public/images/use-cases/reproducible_logbook.png

Credit: Juliette Taka, Logilab and the OpenDreamKit project (2017)

Binder exercise/demo

In an earlier episode (Data visualization with Matplotlib) we have created this notebook:

import pandas as pd
import matplotlib.pyplot as plt

url = "https://raw.githubusercontent.com/plotly/datasets/master/gapminder_with_codes.csv"
data = pd.read_csv(url)
data_2007 = data[data["year"] == 2007]

fig, ax = plt.subplots()

ax.scatter(x=data_2007["gdpPercap"], y=data_2007["lifeExp"], alpha=0.5)

ax.set_xscale("log")

ax.set_xlabel("GDP (USD) per capita")
ax.set_ylabel("life expectancy (years)")

We will now first share it via GitHub “statically”, then using Binder.

Binder-2: Exercise/demo: Make your notebooks reproducible by anyone (15 min)

Instructor demonstrates this. This exercise (and all following) requires git/GitHub knowledge and accounts, which wasn’t a prerequisite of this course. Thus, this is a demo (and might even be too fast for you to type-along). Watch the video if you are reading this later on:

  • Creates a GitHub repository

  • Uploads the notebook file

  • Then we look at the statically rendered version of the notebook on GitHub

  • Create a requirements.txt file which contains:

    pandas==1.2.3
    matplotlib==3.4.2
    
  • Commit and push also this file to your notebook repository.

  • Visit https://mybinder.org and copy paste the code under “Copy the text below …” into your README.md:

    ../_images/binder.jpg
  • Check that your notebook repository now has a “launch binder” badge in your README.md file on GitHub.

  • Try clicking the button and see how your repository is launched on Binder (can take a minute or two). Your notebooks can now be explored and executed in the cloud.

  • Enjoy being fully reproducible!

How can I get a DOI from Zenodo?

Zenodo is a general purpose open-access repository built and operated by CERN and OpenAIRE that allows researchers to archive and get a Digital Object Identifier (DOI) to data that they share.