Welcome to reportsrender’s documentation!¶
Generate reproducible reports from Rmarkdown or jupyter notebooks¶
Reportsrender allows to create reproducible, consistently looking HTML reports from both jupyter notebooks and Rmarkdown files. It makes use of papermill and Rmarkdown to execute notebooks and uses Pandoc to convert them to HTML.
- Features:
two execution engines: papermill and Rmarkdown.
support any format supported by jupytext.
create self-contained HTML that can be shared easily.
hide inputs and/or outputs of cells.
parametrized reports
See the documentation for more details!
Getting started¶
Execute an rmarkdown document to HTML using the Rmarkdown engine
reportsrender --engine=rmd my_notebook.Rmd report.html
Execute a parametrized jupyter notebook with papermill
reportsrender --engine=papermill jupyter_notebook.ipynb report.html --params="data_file=table.tsv"
Usage from command line¶
reportsrender
Execute and render a jupyter/Rmarkdown notebook.
The `index` subcommand generates an index html
or markdown file that links to html documents.
Usage:
reportsrender <notebook> <out_file> [--cpus=<cpus>] [--params=<params>] [--engine=<engine>]
reportsrender index [--index=<index_file>] [--title=<title>] [--] <html_files>...
reportsrender --help
Arguments and options:
<notebook> Input notebook to be executed. Can be any format supported by jupytext.
<out_file> Output HTML file.
-h --help Show this screen.
--cpus=<cpus> Number of CPUs to use for Numba/Numpy/OpenBLAS/MKL [default: 1]
--params=<params> space-separated list of key-value pairs that will be passed
to papermill/Rmarkdown.
E.g. "input_file=dir/foo.txt output_file=dir2/bar.html"
--engine=<engine> Engine to execute the notebook. [default: auto]
Arguments and options of the `index` subcommand:
<html_files> List of HTML files that will be included in the index. The tool
will generate relative links from the index file to these files.
--index=<index_file> Path to the index file that will be generated. Will be
overwritten if exists. Will auto-detect markdown (.md) and
HTML (.html) format based on the extension. [default: index.html]
--title=<title> Headline of the index. [default: Index]
Possible engines are:
auto Use `rmd` engine for `*.Rmd` files, papermill otherwise.
rmd Use `rmarkdown` to execute the notebook. Supports R and
python (through reticulate)
papermill Use `papermill` to execute the notebook. Works for every
kernel available in the jupyter installation.
Installation¶
Conda (recommended):¶
As this reportsrender dependes on both R and Python packages, I recommend to install it through conda. The following command will install reportsrender and all its dependencies in the current conda environment:
conda install -c conda-forge grst::reportsrender
If you prefer not to use conda, you can follow the approach below:
Manual installation:¶
Get dependencies:¶
Python
For the Rmarkdown render engine additionally (there is no need to install them if you are not going to use the Rmarkdown rendeirng engine):
R and the following packages:
rmarkdown
reticulate
then,
Install from github:¶
pip install flit
flit installfrom github:grst/reportsrender
Features¶
Execution engines¶
Reportsrender comes with two execution engines:
Rmarkdown. This engine makes use of the Rmarkdown package implemented in R. Essentially, this engine calls Rscript -e “rmarkdown::render()”. It supports Rmarkdown notebooks (Rmd format) and python notebooks through reticulate.
Papermill. This engine combines papermill and nbconvert to parametrize and execute notebooks. It supports any programming language for which a jupyter kernel is installed.
Supported notebook formats¶
Reportsrender uses jupytext to convert between input formats. Here is the full list of supported formats.
So no matter if you want to run an Rmd file with papermill, an ipynb with Rmarkdown or a Hydrogen percent script, reportsrender has got you covered.
Hiding cell inputs/outputs¶
You can hide inputs and or outputs of individual cells:
Papermill engine:¶
Within a jupyter notebook:
edit cell metadata
add one of the following tags: hide_input, hide_output, remove_cell
{
"tags": [
"remove_cell"
]
}
Rmarkdown engine:¶
all native input control options (e.g. results=’hide’, include=FALSE, echo=FALSE) are supported. See the Rmarkdown documentation for more details.
Jupytext automatically converts the tags to Rmarkdown options for all supported formats.
Parametrized notebooks¶
Papermill engine:¶
See the Papermill documentation
Example:
Add the tag parameters to the metadata of a cell in a jupyter notebook.
Declare default parameters in that cell:
input_file = '/path/to/default_file.csv'
Use the variable as any other:
import pandas as pd
pd.read_csv(input_file)
Rmarkdown engine:¶
See the documentation.
Example:
Declare the parameter in the yaml frontmatter.
You can set default parameters that will be used when the notebook is executed interactively in Rstudio. They will be overwritten when running through reportsrender.
---
title: My Document
output: html_document
params:
input_file: '/path/to/default_file.csv'
---
Access the parameters from the code:
read_csv(params$input_file)
Be compatible with both engines:¶
Yes it’s possible! You can execute the same notebook with both engines. Adding parameters is a bit more cumbersome though.
Example (Python notebook stored as .Rmd file using jupytext):
---
title: My Document
output: html_document
params:
input_file: '/path/to/default_file.csv'
---
```{python tags=c("parameters")}
try:
# try to get param from Rmarkdown using reticulate.
input_file = r.params["input_file"]
except:
# won't work if running papermill. Re-declare default parameters.
input_file = "/path/to/default_file.csv"
```
Sharing reports¶
Reportsrender create self-contained HTML files that can be easily shared, e.g. via email.
I do, however, recommend using github pages to upload and share your reports. A central website serves as a single point of truth and elimiates the problem of different versions of your reports being emailed around.
You can make use of reportsrender index to automatically generate an index page listing multiple reports:
Say, you generated several reports and already put them into your github-pages directory:
gh-pages
├── 01_preprocess_data.html
├── 02_analyze_data.html
└── 03_visualize_data.htmlp
Then you can generate an index file listing and linking to your reports by running
reportsrender index --index gh-pages/index.md gh-pages/*.html
For more details see Usage from command line and reportsrender.build_index()
Password protection¶
Not all analyses can be shared publicly. Unfortunately, github-pages does not support password protection.
There is a workaround, though:
As github-pages doesn’t list directories, you can simply create a long, cryptic subdirectory, e.g. t8rry6poj7ua6eujqpb57 and put your reports within. Only people with whom you share the exact link will be able to access the site.
Combine notebooks into a pipeline¶
Reportsrender is built with pipelines in mind. You can easily combine individual analysis steps into a fully reproducible pipeline using workflow engines such as Nextflow or Snakemake.
A full example how such a pipeline might look like is available in a dedicated GitHub repository: universal_analysis_pipeline. It’s based on Nextflow, but could easily be adapted to other pipelining engines.
Usage as Python library¶
Reportsrender provides a public API that can be used to execute and convert notebooks to HTML:
Execute and render notebooks as HTML reports.
|
Wrapper function to render an Rmarkdown document with the R rmarkdown package and convert it to HTML using pandoc and a custom template. |
|
Wrapper function to render a jupytext/jupyter notebook with papermill and pandoc. |
|
Convert to HTML using pandoc. |
|
Create an index file referencing all specified html files. |