Python Development with Jupyter, Gitlab CI and Tox

November 2020 ยท 4 minute read

In this post I want to give a brief overview how to structure a Python package, load it in Jupyter notebooks, setup continuous integration with Gitlab and tox-conda that I like to use for data science projects.

Most data scientists coming from R will probably ask themselves:

What is the way to structure a Python package?

and:

How can I get the load_all() behaviour from RStudio replicated in Jupyter to get access to my package in my notebooks?

While in R there is only one (admittedly sometimes limiting) way to structure a package, you have got quite a lot of options in the Python world.

Since data science workloads rely heavily on the conda packaging ecosystem (especially on Windows machines), I will focus on how to make everything play nicely with conda.

Python Package

We begin setup by creating a new conda environment and specifying the interpreter we want to use and activating it:

conda create --name <environment_name> python=3.8 -y
conda activate <environment_name>

Now we need to setup our package structure. In R there is a function called package.skeleton() (or alternatively RStudio’s GUI) that create a basic package with all necessary files. For Python there are different packages that provide similar functionality (such as cookie-cutter), but I will quickly outline a minimal setup here:

src/
    package_name/
        __init__.py
        python_file1.py
        ...
        python_fileN.py
tests/
notebooks/
data/
outputs/
setup.py
gitlab-ci.yml
tox.ini
config.yml
CHANGELOG.md
README.md
.gitignore
.env

Let’s take a look at three important files:

setup.py

A simple setup.py file sufficient for small to medium sized projects can look like this:

from setuptools import setup
from setuptools import find_packages

setup(name='<pkg-name>',
      version='0.10',
      description='<package-description-here>',
      author='<your-name>',
      author_email='<your-email>',
      license='<your-license>',
      packages=find_packages("src"),
      package_dir={"": "src"},
      zip_safe=False,
      install_requires=[
          '<some-pkg>,
          ## Note: the space between the @ are acutally necessary!
          '<some-gitlab-package>  @  git+ssh://git@<package-url-with-slashes>.git@master',
      ],
      extras_require={
          'dev': [
              'pytest',
              'mypy',
              'pylint',
              'coverage',
              'python-dotenv',
              'tox-conda',
              'ipykernel',
              'matplotlib',
              'plotnine',
              'seaborn',
          ]
      }
)

For development purposes, you can install your package using:

pip install -e .[dev]

gitlab-ci.yml

At work, IT only provides us with a basic Gitlab bash runner, so we need to be careful not to accidentially change our testing environment permanently.

As a work around, we use tox-conda which as the name suggests makes tox play nicely with conda environments. All test are run using tox in conda environments.

A very simple gitlab-ci.yml can look like this:

stages:
# sample stages might include:
- test
- deploy

variables:
    # declare any variables here
    WORKSPACE: "../{CI_PROJECT_NAME}

before_script:
    # here you can run any commands before the pipelines start
    - echo $CONDA_PREFIX

after_script:
    # Do something after the pipeline is finished
    # Attention: if you do clean-up here and delete your junit output, Gitlab cannot upload the file 

test job:
    stage: test
    script:
    - tox
    artifacts:
        when: always
        reports: 
            junit: report.xml
        paths:
            - report.xml
        expire_in: 1 week

deploy job:
  stage: deploy
  script: <deploy>

last job:
  stage: .post
  script: <this will always be the last run job, so you can do for example clean-up here>

# Find more info here: https://docs.gitlab.com/ee/ci/yaml/

The next file we are going to cover ist tox.ini that helps us with running tests. Tox is a very convenient tool, as it takes care of automatically creating test environments (in our case conda environment, because we use tox-conda) and running any tests etc we want. Let’s take a look:

tox.ini

[tox]
envlist =
    {py37, py38}

[testenv]
passenv = *
deps=
    pytest-sugar
    python-dotenv
commands=
    pytest --junitxml=report.xml

[testenv:black]
deps=
    black
commands=
    black --check .

Note, the passenv part is necessary if you want to pass environment variables from Gitlab to tox. You can test multiple Python versions by passing them in envlist.

Don’t forget to exclude your .env file from version control by putting it in .gitignore in case you have any api tokens or similar in your environment file.

You can specify environment variables in Gitlab to be used by your pipelines where you can mask them in log outputs.

Jupyter Integration

Since I already included the ipykernel package in my setup.py file, we do not need to install it anymore.

python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

Last, but not least, I include the following code in my juypter notebook to get the mimic RStudio’s load_all(), and load some environment variables:

# Load autoreload module 
# autoreload 1: reload modules imported with %aimport everytime before execution
# autoreload 2: reload all modules
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext dotenv

dotenv -o "../.env"

That’s it, you should have working Python project structure ready to use :)

You can download a zip file containing a sample project from here.