rguides

Advanced R Markdown: from templates to Pandoc

This article picks up where the basic R Markdown guide leaves off. You already know what a chunk is, how YAML metadata works, and how to knit an .Rmd into HTML or PDF. The next questions are about scale and control: how do you render dozens of reports from one template, cache long computations safely, customise the output beyond theme: cosmo, and reach into the Pandoc layer to reshape the document itself. That is the territory this guide covers.

Parameterised Reports

The params: block in your YAML header turns an R Markdown file into a template. Each parameter gets a label, a default value, and an input widget that controls how RStudio’s “Knit with Parameters” dialog presents it.

---
title: "Country Report: `r params$country`"
output:
  html_document:
    toc: true
    toc_float: true
    theme: cosmo
    df_print: paged
params:
  country:
    label: "Country:"
    value: "Norway"
    input: select
    choices: [Norway, Sweden, Denmark, Finland]
  year:
    label: "Year:"
    value: 2024
    input: slider
    min: 2010
    max: 2024
    step: 1
---

Inside the document, every parameter is available as params$<name>. Inline code (r params$country) and chunk code read from the same list, so headers, captions, and dataset filters all reflect the chosen values.

The real payoff comes when you call rmarkdown::render() from a script, where you can loop over parameters and render many variants in a single run without ever opening the file. The function signature you need most often looks like this:

render(
  input,
  output_format = NULL,
  output_file = NULL,
  output_dir = NULL,
  params = NULL,
  ...
)

params accepts a named list matching the YAML block. The loop below renders a separate HTML file per country, writing each one into out/ with a date-stamped filename and a quiet log so the console output stays clean when many reports run back-to-back:

library(rmarkdown)

for (country in c("Norway", "Sweden", "Denmark")) {
  rmarkdown::render(
    input        = "report.Rmd",
    output_file  = paste0("report-", country, "-", Sys.Date(), ".html"),
    output_dir   = "out/",
    params       = list(country = country, year = 2024),
    quiet        = TRUE
  )
}

Two things trip people up. First, the input: widget only matters for the RStudio dialog. Programmatic render(..., params = list(...)) skips widget validation entirely, so coerce types yourself. Second, a YAML value: 2024 becomes a numeric; value: "2024" becomes a string. Mismatches break downstream code silently.

Controlling knitr globally

The first chunk of almost every real R Markdown document is a setup block that sets defaults for the rest of the file. The chunk header is ```{r setup, include=FALSE} and the body looks like this:

# ```{r setup, include=FALSE}
knitr::opts_chunk$set(
  echo       = FALSE,
  message    = FALSE,
  warning    = FALSE,
  fig.width  = 7,
  fig.height = 4,
  fig.align  = "center",
  dpi        = 300,
  out.width  = "85%"
)
# ```

opts_chunk$set() applies to every chunk in the document. Per-chunk options still win, so a single chunk can override the global default when needed. For knitr-package-level options such as working directory, progress bars, and upload functions, use opts_knit$set():

knitr::opts_knit$set(
  root.dir = "project/",
  progress = FALSE,
  verbose  = FALSE
)

Custom hooks are where this gets interesting. A hook receives the chunk body, options, and environment, and returns a string. The following wraps a chunk in a Pandoc callout div, ready for bookdown-style callout styling:

# ```{r hooks, include=FALSE}
knitr::knit_hooks$set(
  box = function(before, options, envir) {
    if (before) {
      "\n::: {.callout-note}\n"
    } else {
      "\n:::\n"
    }
  }
)
# ```

Then any chunk with box = TRUE gets wrapped automatically. You can build callout boxes, syntax-highlighted callouts, themed code blocks, or anything else your Pandoc template supports.

Caching done right

cache = TRUE stores the chunk’s result and skips re-evaluation when the source code has not changed. The two options that keep caching honest are dependson and autodep:

# ```{r raw, cache=TRUE}
raw_data <- read.csv("large.csv")  # slow
# ```

# ```{r clean, cache=TRUE, dependson="raw"}
clean_data <- clean(raw_data)  # depends on raw
# ```

dependson declares that the current chunk depends on another chunk by label. The cache is invalidated if either chunk’s source changes. autodep = TRUE is a heuristic that scans for object names. It is convenient but conservative: it over-invalidates more often than necessary.

The honest caveat is that cache invalidation is genuinely hard. A cached chunk re-uses old data even if a global object outside the chunk changes. dependson only handles chunk-to-chunk dependencies. If you have a heavy pipeline, the targets package gives you stronger guarantees than chunk caching.

Child documents and dynamic content

The child = chunk option statically embeds another .Rmd at knit time. For dynamic content, use knitr::knit_child() inside a loop. The child file looks like a normal Rmd but reads its variables from the envir argument that the caller passes in, so a single template can produce many variants.

child-region.Rmd:

## Region: `r region`

The data for `r region` is summarised below.

The main Rmd loops over the regions, renders the child for each, and concatenates the rendered fragments back into the parent document. With results = 'asis', the stitched output flows in as if it had been written there by hand, which is what makes per-region or per-customer reports practical from one template:

# ```{r region-loop, echo=FALSE, results='asis'}
regions <- c("North", "South", "East", "West")
res <- character()
for (r in regions) {
  res <- c(res, knitr::knit_child("child-region.Rmd", envir = list(region = r)))
}
cat(res, sep = "\n")
# ```

knit_child() returns the rendered child as a string. The envir argument scopes the variables visible to the child; pass quiet = TRUE to suppress progress output. This pattern is the right tool for per-region or per-customer reports built from a single template. Package vignettes use a closely related approach; see the vignette-writing tutorial for the package-author version of this idea.

For templated inline text, knitr::knit_expand() substitutes {{var}} placeholders with values from a list. It is lighter than knit_child() when you do not need a whole mini-document.

Customising output

The two workhorses are rmarkdown::html_document() and rmarkdown::pdf_document(). Useful options to know:

OptionOutputPurpose
themeHTMLBootstrap/bslib theme name ("cosmo", "flatly", "cerulean")
toc, toc_floatHTMLTable of contents, sticky on scroll
code_foldingHTML"hide", "show", or "none"
df_printHTML"paged", "kable", "tibble"
includesBothInject raw HTML/LaTeX in in_header, before_body, after_body
cssHTMLPath to a custom stylesheet
pandoc_argsBothPass through to Pandoc (--lua-filter=, --variable=, etc.)
latex_enginePDF"pdflatex", "xelatex", "lualatex", "tectonic"
keep_texPDFKeep the intermediate .tex for debugging

The includes option is the right tool for adding a corporate header, a custom footer, or a preamble. The pandoc_args option is the door into the Pandoc layer.

Pandoc Lua filters

A Lua filter is a small Lua script that walks the Pandoc AST and modifies elements in place. R Markdown passes filters through pandoc_args, and you can list several filters in order if you need a chain of small transformations:

---
output:
  html_document:
    pandoc_args:
      - "--lua-filter=raise-header.lua"
      - "--toc-depth=2"
---

The Lua file itself is a tiny program against the AST. This filter walks every Header element, errors out if the level would drop below 1 (which would mean rewriting the document title to nothing), and otherwise decrements the level by 1. The result is that an ## Section becomes # Section in the output:

function Header(el)
  if (el.level <= 1) then
    error("I don't know how to raise the level of h1")
  end
  el.level = el.level - 1
  return el
end

The rmarkdown cookbook has a full chapter on Lua filters. The practical takeaway is that any transformation you would otherwise do with regex on the output, such as fixing heading levels, cross-referencing, or code-block styling, can be done upstream in the AST where it is safer and more accurate.

Multi-format reports

One .Rmd can declare multiple output formats. The output: field accepts a list, and each format gets its own configuration block. Pandoc runs once per format and writes the result next to the source file by default. If you want a different working directory, pass output_dir to render():

---
output:
  html_document:
    toc: true
    theme: cosmo
  pdf_document:
    latex_engine: xelatex
    keep_tex: true
  word_document:
    reference_docx: template.docx
---

Render every declared format with a single call by passing the "all" shortcut to output_format. Pandoc iterates over the YAML, applies each format’s options, and writes the output with a matching extension next to the source file:

rmarkdown::render("report.Rmd", output_format = "all")

output_format = "all" is a string shortcut for “every format declared in YAML.” If you only want some, pass a list of format names or output_format() objects.

Interactive R Markdown

For documents that need to react to user input, set runtime: shiny:

---
output: html_document
runtime: shiny
---

The catch is that a runtime: shiny document must be served by a Shiny server. It will not work as a static HTML file. For dashboards with multiple panels, use the flexdashboard output format instead; it accepts the same runtime: shiny setting and provides a layout grid. htmlwidgets packages such as plotly, leaflet, and DT work in static HTML and are usually the right answer for self-contained reports.

Common gotchas

A few things to watch for:

  • Underscores in chunk labels cause trouble in some output formats and downstream packages. Stick to alphanumerics and dashes.
  • include = FALSE versus echo = FALSE: include = FALSE runs the chunk and suppresses both source and output; echo = FALSE shows output but hides source. To run a setup chunk silently, use include = FALSE.
  • results = "asis" is required when you want R-generated markdown to render as markdown rather than be wrapped in a code block.
  • PDF float placement can be controlled with fig.pos = "H" (from the float package) if LaTeX insists on moving your figure.
  • self_contained = FALSE generates a folder of dependencies. Good for production sites with caching, bad for emailing a report.
  • Lua filters require Pandoc 2.0 or later. The output_format() API has been stable since 2018.

Conclusion

The basic R Markdown workflow covers most needs. The advanced features kick in when the same template has to serve many users, when the document has to look exactly right, or when the report is part of a longer pipeline. Parameterised reports, hooks, caching, Lua filters, and the pandoc_args door are the tools that turn R Markdown from a notebook into a publishing system.

If you are hitting the limits of R Markdown itself, the natural next step is Quarto; it absorbs most of the patterns above and adds multi-language support. For a side-by-side comparison, see Quarto vs R Markdown. For a worked parameterised example in Quarto, the Quarto parameterised reports tutorial maps directly onto the patterns in this article.

See Also