Containers 6: Packaging the case study

During these tutorials we have been working on a case study about the multiresistant bacteria MRSA. Here we will build and run a Docker container that contains all the work we’ve done so far.

We’ve set up a GitHub repository for version control and for hosting our project.
We’ve defined a Conda environment that specifies the packages we’re depending on in the project.
We’ve constructed a Snakemake workflow that performs the data analysis and keeps track of files and parameters.
We’ve written a R Markdown document that takes the results from the Snakemake workflow and summarizes them in a report.

The workshop-reproducible-research/tutorials/containers directory contains the final versions of all the files we’ve generated in the other tutorials: environment.yml, Snakefile, config.yml, code/header.tex, and code/supplementary_material.Rmd. The only difference compared to the other tutorials is that we have also included the rendering of the Supplementary Material HTML file into the Snakemake workflow as the rule make_supplementary. Running all of these steps will take some time to execute (around 20 minutes or so), in particular if you’re on a slow internet connection.

Now take a look at Dockerfile. Everything should look quite familiar to you, since it’s basically the same steps as in the image we constructed in the previous section, although some sections have been moved around. The main difference is that we add the project files needed for executing the workflow (mentioned in the previous paragraph), and install the conda packages listed in environment.yml. If you look at the CMD command you can see that it will run the whole Snakemake workflow by default.

Now run docker build as before, tag the image with my_docker_project:

docker build -t my_docker_project -f Dockerfile .

Go get a coffee while the image builds (or you could use docker pull nbisweden/workshop-reproducible-research which will download the same image).

Validate with docker image ls. Now all that remains is to run the whole thing with docker run. We just want to get the results, so mount the directory /course/results/ to, say, mrsa_results in your current directory.

Well done! You now have an image that allows anyone to exactly reproduce your analysis workflow (if you first docker push to Dockerhub that is).

Tip
If you’ve done the Jupyter tutorial, you know that Jupyter Notebook runs as a web server. This makes it very well suited for running in a Docker container, since we can just expose the port Jupyter Notebook uses and redirect it to one of our own. You can then work with the notebooks in your browser just as you’ve done before, while it’s actually running in the container. This means you could package your data, scripts and environment in a Docker image that also runs a Jupyter Notebook server. If you make this image available, say on Dockerhub, other researchers could then download it and interact with your data/code via the fancy interactive Jupyter notebooks that you have prepared for them. We haven’t made any fancy notebooks for you, but we have set up a Jupyter Notebook server. Try it out if you want to (replace the image name with your version if you’ve built it yourself):
docker run -it -p 8888:8888 nbisweden/workshop-reproducible-research \
    jupyter notebook  --ip=0.0.0.0 --allow-root