Jupyter 9: Extra material
Here are some useful resources if you want to read more about Jupyter in general:
- The Jupyter project site Links to an external site. contains a lot of information and inspiration.
- The Jupyter Notebook documentation Links to an external site..
- A guide Links to an external site. to using widgets for creating interactive notebooks.
Running jupyter notebooks on a cluster
- Login to Uppmax, making sure to use a specific login node,
e.g.
rackham1
:
ssh <your-user-name>@rackham1.uppmax.uu.se
- Create/activate a conda environment containing
jupyter
then run:
jupyter notebook
When the Jupyter server starts up you should see something resembling:
[I 11:00:00.000 NotebookApp] Serving notebooks from local directory: <path-to-your-local-dir>
[I 11:00:00.000 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 11:00:00.000 NotebookApp] http://localhost:8889/?token=357d65100058efa40a0641fce7005addcff339876c5e8000
[I 11:00:00.000 NotebookApp] or http://127.0.0.1:8889/?token=357d65100058efa40a0641fce7005addcff339876c5e8000
[I 11:00:00.000 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Now a Jupyter notebook server is running on the Uppmax end. The line that says:
[I 11:00:00.000 NotebookApp] http://localhost:8889/?token=357d65100058efa40a0641fce7005addcff339876c5e8000
contains information on the port used on the server side (8889 in
this case) and the token required to use the server
(357d65100058efa40a0641fce7005addcff339876c5e8000
).
Next step is to use this information to login to the server from your local computer.
On your local computer
In a terminal, run the following command to start port forwarding of port 8080 on your local computer to the remote port on the Uppmax side. Replace with the port given when you started the server on Uppmax. Also replace with your user name on Uppmax.
ssh -N -L localhost:8080:localhost:<remote-port> <your-user-name>@rackham1.uppmax.uu.se
As long as this process is running the port forwarding is running. To
disable it simply interrupt it with CTRL + C
.
Connect to the jupyter server by opening localhost:8080
in your browser. When prompted, paste the token you got when starting
the server on Uppmax.
You are now (hopefully) accessing the jupyter server that’s running on Uppmax, via your local browser.
Integrating notebooks with Snakemake workflows
In the case study section
of this tutorial we created a Jupyter notebook that used output from a
Snakemake workflow and produced some summary results and plots. Wouldn’t
it be nice if this was actually part of the workflow itself? To generate
a HTML version of the notebook we can use what we learned in the section
about converting notebooks.
The command to execute the notebook and save it in HTML format in a file
results/supplementary.html
would be:
jupyter nbconvert --to HTML --output-dir results --output supplementary.html --execute supplementary_material.ipynb
This command could be used in a rule, e.g.
make_supplementary
, the input of which would be
results/tables/counts.tsv
,
intermediate/multiqc_general_stats.txt
, and
results/rulegraph.png
. See if you can work out how to
implement such a rule at the end of the Snakefile
found in
the jupyter/
directory. You can find an example in the code
chunk below:
rule make_supplementary:input:
= "results/tables/counts.tsv",
counts = "results/tables/counts.tsv.summary",
summary = "intermediate/multiqc_general_stats.txt",
multiqc_file = "results/rulegraph.png"
rulegraph
output:"results/supplementary.html"
params:= lambda wildcards, output: os.path.basename(output[0]),
base dir = lambda wildcards, output: os.path.dirname(output[0])
shell:"""
jupyter nbconvert --to HTML --output-dir {params.dir} --output {params.base} \
--execute supplementary_material.ipynb
"""
Note
The Conda enivronment for the jupyter tutorial does not contain packages required to run the full snakemake workflow. So if you wish to test jupyter integration fully you should update the conda environment by runningconda install snakemake-minimal fastqc sra-tools multiqc bowtie2 tbb samtools htseq bedtools wget graphviz
More integrations
Snakemake actually supports the execution of notebooks via the
notebook:
rules directive. See more about Jupyter
integration in the snakemake
docs
Links to an external site.. In the notebook:
directive of such a rule you
specify the path to a jupyter notebook (relative to the Snakefile) which
is then executed when the rule is run.
So how is this useful?
In the notebook itself this gives you access to a
snakemake
object containing information about
input and output files for the rule
via snakemake.input
and snakemake.output
.
Similarly, you can access rule wildcards with
snakemake.wildcards
, params with
snakemake.params
, and config settings with
snakemake.config
.
When snakemake runs the rule with the notebook:
directive jupyter-nbconvert
is used to execute the
notebook. No HTML output is generated here but it is possible to store a
version of the notebook in its final processed form by adding the
following to the rule:
log:="<path>/<to>/<processed>/<notebook.ipynb>" notebook
Because you won’t get the notebook in full HTML glory, this type of integration is better suited if you want to use a notebook to generate figures and store these in local files (e.g. pdf/svg/png formats).
We’ll use the supplementary_material.ipynb
notebook as
an example! Let’s say that instead of exporting the entire notebook to
HTML we want a rule that outputs pdf versions of the barplot and heatmap
figures we created.
Let’s start by setting up the rule. For simplicity we’ll use the same
input as when we edited the notebook in the first place. The output will
be results/barplot.pdf
and
results/heatmap.pdf
. Let’s also output a finalized version
of the notebook using the log: notebook=
directive:
rule make_supplementary_plots:input:
= "results/tables/counts.tsv",
counts = "results/tables/counts.tsv.summary",
summary = "intermediate/multiqc_general_stats.txt",
multiqc = "results/rulegraph.png"
rulegraph
output:= "results/barplot.pdf",
barplot = "results/heatmap.pdf"
heatmap
log:= "results/supplementary.ipynb" notebook
The notebook will now have access to
snakemake.input.counts
,
snakemake.output.barplot
and
snakemake.output.heatmap
when executed from within the
workflow. Let’s go ahead and edit the notebook! In the cell where we
defined notebook parameters edit the code so that it looks like
this:
=snakemake.input.counts
counts_file=snakemake.input.summary
summary_file=snakemake.input.multiqc
multiqc_file=snakemake.input.rulegraph
rulegraph_file
=snakemake.params.SRR_IDs
SRR_IDs=snakemake.params.GSM_IDs
GSM_IDs=snakemake.params.GEO_ID GEO_ID
Notice that we set the SRR_IDs
, GSM_IDs
and
GEO_ID
variables using variables in
snakemake.params
? However, we haven’t defined these in our
rule yet so let’s go ahead and do that now. Add the params
section so that the make_supplementary_plots
in the
Snakefile looks like this:
rule make_supplementary_plots:input:
= "results/tables/counts.tsv",
counts = "intermediate/multiqc_general_stats.txt",
multiqc = "results/rulegraph.png"
rulegraph
output:= "results/barplot.pdf",
barplot = "results/heatmap.pdf"
heatmap
log:= "results/supplementary.ipynb"
notebook
params:= ["SRR935090","SRR935091","SRR935092"],
SRR_IDs = ["GSM1186459", "GSM1186460", "GSM1186461"],
GSM_IDs = "GSE48896"
GEO_ID "supplementary_material.ipynb" notebook:
Tip
One way to further generalize this rule could be to define the SRR_IDs, GSM_IDs and GEO_ID parameters in a config file instead, in which case they would be directly accessible from within the notebook usingsnakemake.config['SRR_IDs']
etc.
Now the rule contains everything needed, but we still need to edit the notebook to save the plots to the output files. First, edit the cell that generates the barplot so that it looks like this:
# Create a stacked barplot
= summary_plot_data.T.plot(kind="bar", stacked=True, color=colors)
ax # Move legend and set legend title
=(1,1), title="Category");
ax.legend(bbox_to_anchor=300, bbox_inches="tight") ## <-- Add this line! plt.savefig(snakemake.output.barplot, dpi
Finally, edit the cell that generates the heatmap so that it looks like this:
= counts.loc[:, SRR_IDs]
count_data = count_data.loc[(count_data.std(axis=1).div(count_data.mean(axis=1))>1.2)&(count_data.max(axis=1)>5)]
heatmap_data = heatmap_data.rename(columns = name_dict['title'])
heatmap_data with sns.plotting_context("notebook", font_scale=0.7):
= sns.clustermap(data=np.log10(heatmap_data+1), cmap="RdYlBu_r",
ax ="complete", yticklabels=True, linewidth=.5,
method=(.7, .85, .05, .1), figsize=(3,9))
cbar_pos=270)
plt.setp(ax.ax_heatmap.get_xticklabels(), rotation=300, bbox_inches="tight") ## <-- Add this line! plt.savefig(snakemake.output.heatmap, dpi
Now you can run the following to generate the plots:
snakemake -j 1 make_supplementary_plots
Presentations with Jupyter
As if all the above wasn’t enough you can also create presentations/slideshows with Jupyter! Simply use conda to install the RISE Links to an external site. extension to your jupyter environment:
conda install -c conda-forge rise
then open up a notebook of your choice. In the menu click View -> Cell Toolbar -> Slideshow. Now every cell will have a drop-down in the upper right corner allowing you to set the cell type:
- Slide: a regular slide
- Sub-Slide: a regular slide that will be displayed below the previous
- Fragment: these cells split up slides so that content (fragments) are added only when you press Space
- Skip: these cells will not appear in the presentation
- Notes: these cells act as notes, shown in the speaker view but not in the main view
The presentation can be run directly from the notebook by clicking
the ‘Enter/Exit RISE Slideshow’ button (looks like a bar chart) in the
toolbar, or by using the keyboard shortcut Alt-r
. Running
it directly from a notebook means you can also edit and run cells during
your presentation. The downside is that the presentation is not as
portable because it may rely on certain software packages that other
people are not comfortable with installing.
You can also export the notebook to an HTML-file with
jupyter nbconvert --execute --to SLIDES <your-notebook.ipynb>
.
The resulting file, with the slideshow functionality included, can be
opened in any browser. However, in this format you cannot run/edit
cells.