Productivity tools

Questions

Do you have preferences on the visual aspects of the code and how it should look?
Do you use any tools that help you create better looking code faster?

Objectives

Learn tools that can help you be more productive.
Learn how to follow standards that other people have created and how to pick your own favorite.

Spotting code problems with linters

Python as a programming language has a syntax that specifies the rules that the code must follow. If the code is not written with valid syntax, you will get an error.

# Valid syntax, returns 1
a = 1
print(a)

# Invalid syntax, returns SyntaxError
True = 1

Spotting syntax errors can be time consuming and to help this programmers have created linters. Linters are tools that check whether code’s syntax is correct.

Some popular linters include:

In the following example lets use pylint to check the following script (lint_example.py: to easily download to JupyterLab, use File → Open from URL → Paste URL → It will download and open in a window.):

import numpy
import matplotlib.pyplot as plt

x = np.linspace(0, np.pi, 100))
y = np.sin(x)

plt.plot(x, y)

plt.show()

To run pylint from the terminal in JupyterLab, File → New → Terminal. Make sure you are in the right directory, then you can run pylint:

$ pylint lint_example.py
************* Module lint_example
lint_example.py:4:31: E0001: Parsing failed: 'unmatched ')' (<unknown>, line 4)' (syntax-error)

From here we can see that pylint says that there is a unmatched bracket on line 4. We also get an message code E0001 (syntax-error). We can find description for the message from Pylint’s messages list and look at the specific error page to see an example that describes the error.

After fixing the problem with the bracket and running pylint again we get more errors:

$ pylint lint_example.py
************* Module lint_example
lint_example.py:1:0: C0114: Missing module docstring (missing-module-docstring)
lint_example.py:4:4: E0602: Undefined variable 'np' (undefined-variable)
lint_example.py:4:19: E0602: Undefined variable 'np' (undefined-variable)
lint_example.py:5:4: E0602: Undefined variable 'np' (undefined-variable)
lint_example.py:1:0: W0611: Unused import numpy (unused-import)

------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)

Here we see the following suggestions:

On line 1 we’re missing a module docstring. This is a warning that we’re going against a coding convention and thus we get a CXXXX message code. This is not critical, so let’s not focus on this for now.
On lines 4 and 5 we have undefined variable np. This will create error if we would execute the code and thus we get a EXXXX message code.
On line 1 we have unused import for numpy module. This won’t create an error, but Pylint flags this as unnecessary and will give a warning with WXXX message code.

At the end Pylint will give a rating for the code. In this case the errors will give an overall rating of 0.00/10 as the code won’t execute correctly.

From these messages we can deduce that the main problem is that the import statement does not use import numpy as np and thus np is undefined.

After changing the import statement, the code works correctly and running pylint lint_example.py will only warn about the missing docstring. You can also notice that the changes have increased the rating and Pylint will show the improvement since last run.

$ pylint lint_example.py
************* Module lint_example
lint_example.py:1:0: C0114: Missing module docstring (missing-module-docstring)

------------------------------------------------------------------
Your code has been rated at 8.33/10 (previous run: 0.00/10, +8.33)

Exercise 1

Using Pylint

The following code uses scikit-learn to fit a simple linear model to randomly generated data with some error. You can download it here (see above for how to easily download and run in JupyterLab).

It has four mistakes in it. One of these cannot be found by Pylint.

Fix the following code with Pylint and try to determine why Pylint did not find the last mistake.

"""
pylint exercise 1
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model


def f(x):
    """
    Example function:

    f(x) = x/2 + 2
    """"
    return 0.5*x + 2


# Create example data
x_data = np.linspace(0, 10, 100)
err = 2 * np.random.random(x_data.shape[0])
y_data = f(x_data) + err

# Put data into dataframe
df = pd.DataFrame({'x': x_data, 'y': y_data})

# Create linear model and fit data
reg = linear_model.LinearRegression(fit_intercept=True)

reg.fit(df[['x'], df[['y']])

slope = reg.coef_[0][0]
intercept = reg.intercept_[0]

df['pred'] = reg.predict(df[['x']])

fig, ax = plt.subplots()

ax.scater(df[['x']], df[['y']], alpha=0.5)
ax.plot(df[['x']], df[['pred']]
        color='black', linestyle='--',
        label=f'Prediction with slope {slope:.2f} and intercept {intercept:.2f}')
ax.set_ylabel('y')
ax.set_xlabel('x')
ax.legend()

plt.show()

Solution

Solution is available here.

Errors were as follows:

Line 15 has an extra "-character, which results in syntax-error.
Line 30 has a missing ]-bracker, which results in syntax-error.
Line 40 is missing a comma at the end, which results in syntax-error.
On line 39 the function scatter is misspelled. Pylint does not notice this as it does not run the code and thus it does not create the ax-object.

"""
pylint exercise 1
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model


def f(x):
    """
    Example function:

    f(x) = x/2 + 2
    """
    return 0.5*x + 2


# Create example data
x_data = np.linspace(0, 10, 100)
err = 2 * np.random.random(x_data.shape[0])
y_data = f(x_data) + err

# Put data into dataframe
df = pd.DataFrame({'x': x_data, 'y': y_data})

# Create linear model and fit data
reg = linear_model.LinearRegression(fit_intercept=True)

reg.fit(df[['x']], df[['y']])

slope = reg.coef_[0][0]
intercept = reg.intercept_[0]

df['pred'] = reg.predict(df[['x']])

fig, ax = plt.subplots()

ax.scatter(df[['x']], df[['y']], alpha=0.5)
ax.plot(df[['x']], df[['pred']],
        color='black', linestyle='--',
        label=f'Prediction with slope {slope:.2f} and intercept {intercept:.2f}')
ax.set_ylabel('y')
ax.set_xlabel('x')
ax.legend()

plt.show()

Enforcing consistent code style

Python is a very flexible language which makes it possible to use all kinds of coding styles.

For example, one could use the following naming styles for variables:

# Different variable styles
myvariable = 1   # Lowercase
myVariable = 1   # Camel case
MyVariable = 1   # Pascal case
my_variable = 1  # Snake case

Everyone has their own preference to what style to use and everybody has freedom to use their preferred style, but to improve legibility of code there are official style guides for code (PEP 8) and for docstrings (PEP 257).

There are many code checkers that give you suggestions on how to modify your code or do the modifications automatically:

Let’s use black and flake8 (with pep8-naming-extension) to modify code_style_example.py:

import  numpy  as np

def  PI_estimate(n):
    """This function calculates an estimate of pi with dart thrower algorithm.
    """

    pi_Numbers =  np.random.random(size = 2*n)
    x = pi_Numbers[ :n ]
    y = pi_Numbers[ n: ]

    return 4*np.sum((x * x + y*y ) < 1)/n


for number  in range(1,8):

    n = 10** number

    print(f'Estimate for PI with {n:8d} dart throws: {PI_estimate( n )}')

Running flake8 to check for style problems we get the following output:

$ flake8 code_style_example.py
code_style_example.py:1:7: E271 multiple spaces after keyword
code_style_example.py:1:14: E272 multiple spaces before keyword
code_style_example.py:3:1: E302 expected 2 blank lines, found 1
code_style_example.py:3:4: E271 multiple spaces after keyword
code_style_example.py:3:6: N802 function name 'PI_estimate' should be lowercase
code_style_example.py:7:6: N806 variable 'pi_Numbers' in function should be lowercase
code_style_example.py:7:17: E222 multiple spaces after operator
code_style_example.py:7:40: E251 unexpected spaces around keyword / parameter equals
code_style_example.py:7:42: E251 unexpected spaces around keyword / parameter equals
code_style_example.py:8:20: E201 whitespace after '['
code_style_example.py:8:23: E202 whitespace before ']'
code_style_example.py:9:20: E201 whitespace after '['
code_style_example.py:9:23: E202 whitespace before ']'
code_style_example.py:11:33: E202 whitespace before ')'
code_style_example.py:14:11: E272 multiple spaces before keyword
code_style_example.py:14:23: E231 missing whitespace after ','
code_style_example.py:16:11: E225 missing whitespace around operator
code_style_example.py:18:67: E201 whitespace after '('
code_style_example.py:18:69: E202 whitespace before ')'

There are plenty of errors and warnings. We could fix these manually, but instead let’s use black to format the code. Black is an “uncompromising Python code formatter” from Python Software Foundation and it automatically modifies your code to match their recommended coding style.

It should fix most of the errors automatically without changing the functionality.

After running black code_style_example.py the code looks like this:

import numpy as np

def PI_estimate(n):
    """This function calculates an estimate of pi with dart thrower algorithm."""

    pi_Numbers = np.random.random(size=2 * n)
    x = pi_Numbers[:n]
    y = pi_Numbers[n:]

    return 4 * np.sum((x * x + y * y) < 1) / n

for number in range(1, 8):
    n = 10**number

    print(f"Estimate for PI with {n:8d} dart throws: {PI_estimate( n )}")

Much cleaner. If we want to check for variable naming syntax we can still run flake8 code_style_example.py:

$ flake8 code_style_example.py
code_style_example.py:4:6: N802 function name 'PI_estimate' should be lowercase
code_style_example.py:5:80: E501 line too long (81 > 79 characters)
code_style_example.py:7:6: N806 variable 'pi_Numbers' in function should be lowercase
code_style_example.py:17:67: E201 whitespace after '('
code_style_example.py:17:69: E202 whitespace before ')'

Fixing these problems we get the final piece of code:

import numpy as np


def pi_estimate(n):
    """
    This function calculates an estimate of pi with dart thrower algorithm.
    """

    pi_numbers = np.random.random(size=2 * n)
    x = pi_numbers[:n]
    y = pi_numbers[n:]

    return 4 * np.sum((x * x + y * y) < 1) / n


for number in range(1, 8):
    n = 10**number

    print(f"Estimate for PI with {n:8d} dart throws: {pi_estimate(n)}")

Comparing the fixed one to the original one the code is much more legible.

Problems with styles and writing your own kind of code

There style black uses is a bit different to PEP 8 and one can definitely argue that it does not handle mathematical expressions in the optimal way.

However, one can turn formatting off for math heavy sections with # fmt: on- and # fmt: off-comments. Alternatively, you can use formatter such as yapf, which supports formatting based on arithmetic precedence:

$ yapf --style='{based_on_style: pep8, arithmetic_precedence_indication=true}' --diff code_style_example.py
--- code_style_example.py       (original)
+++ code_style_example.py       (reformatted)
@@ -10,7 +10,7 @@
     x = pi_numbers[:n]
     y = pi_numbers[n:]

-    return 4 * np.sum((x * x + y * y) < 1) / n
+    return 4 * np.sum((x*x + y*y) < 1) / n


 for number in range(1, 8):

From this diff we see that yapf would change the multiplications to match the arithmetic precedence.

All formatters allow for massive amounts of style changes and you can configure them by creating a configuration file in your repository.

If the formatter makes a change that you do not like you can usually disable the change by changing the configuration of the formatter.

Exercise 2

Using black to format code

Format this code with black:

import numpy as np
import matplotlib.pyplot  as plt

def dice_toss(n,m):

    """Throw n dice m times and the total value together."""
    dice_rolls    = np.random.randint(1,6,size=(m, n))

    roll_averages = np.sum(dice_rolls,axis = -1)

    return roll_averages
fig,ax = plt.subplots( )

n = int( input('Number of dices to toss:\n'))

bins = np.arange(1, 6 * n+1)

m = 1000

ax.hist(dice_toss(n,m), bins = bins)

ax.set_title(f'Histogram of {n} dice tosses')

ax.set_xlabel('Total value' )

ax.set_ylabel('Number of instances')

plt.show()

Solution

Running black exercise2.py will produce this piece of code.

import numpy as np
import matplotlib.pyplot as plt

def dice_toss(n, m):
    """Throw n dice m times and the total value together."""
    dice_rolls = np.random.randint(1, 6, size=(m, n))

    roll_averages = np.sum(dice_rolls, axis=-1)

    return roll_averages

fig, ax = plt.subplots()

n = int(input("Number of dices to toss:\n"))

bins = np.arange(1, 6 * n + 1)

m = 1000

ax.hist(dice_toss(n, m), bins=bins)

ax.set_title(f"Histogram of {n} dice tosses")

ax.set_xlabel("Total value")

ax.set_ylabel("Number of instances")

plt.show()

Integrating productivity tools with git

If you’re using version control you can easily add tools such as pylint, flake8, black and ruff as automatic using tools like pre-commit.

Pre-commit is a tool that makes it easy to automatically run various code checkers when you’re doing a new commit to the repository.

For more information see their website.

Other nice tools

isort - Sorts import statements for you
jupyterlab_code_formatter - Adds formatting functionality to jupyterlab.

Keypoints

Using linters and formatters can help you write cleaner code.
You should adapt your own code and documentation style based on standards that other people use.
Using pre-commit with your git repository can make many of the checks automatic.