R Markdown: Reproducible Reports

March 11, 2026 · 12 min read ·Updated May 27, 2026 ·beginner

rmarkdownreproducible-researchreportingdata-scienceknit

R Markdown is a file format that lets you write documents containing live R code. When you render the document, R executes the code and embeds the results directly into the output. This approach is the foundation of reproducible research; you keep your analysis, visualizations, and conclusions in a single file that anyone can re-run.

This guide covers the basics of R Markdown. By the end, you will know how to create a document, add code chunks, format text, and render it to HTML or PDF.

How the format works

An R Markdown file has the extension .Rmd. It contains three parts:

YAML header, metadata at the top, between --- lines
Markdown text, regular text with formatting
Code chunks; R code blocks that get executed

When you click the Knit button in RStudio, the rmarkdown package processes the file. First, knit runs each code chunk and captures the output. Then, render converts the markdown to your chosen output format. The result is a clean document with your text, code, and results together.

Your first R markdown document

Create a new R Markdown file in RStudio through File > New File > R Markdown. For now, accept the default settings. Your file will look like this:

---
title: "My First Report"
author: "Your Name"
date: "2026-03-11"
output: html_document
---

The second YAML block strips the optional author and date fields, keeping only the required title and output. This minimal frontmatter is preferred for template documents and automated reports where authorship metadata comes from the rendering environment rather than being hardcoded in the file. R Markdown automatically populates the render date when date is omitted, so you get accurate timestamps without manual updates. The stripped-down header also makes it immediately obvious which fields are essential versus decorative; useful when teaching newcomers who might otherwise feel obligated to fill every field they see in RStudio’s default template.

---
title: "Sample Report"
output: html_document
---

Now switch from metadata to document body. The YAML header configures how the document renders, but everything below it is what actually appears on the page. Markdown headings create the document’s structural outline; each ## becomes an H2 heading in the HTML output and a numbered section in the PDF. The ## Introduction heading establishes the first content section readers encounter, while ## Basic Math introduces a section where the document transitions from static prose to live computation. This separation of concerns (YAML for configuration, Markdown for content structure, R for computation) is the fundamental architecture of every R Markdown file.

## Introduction

This is my first R Markdown document. It demonstrates how text and code work together.

## Basic Math

R can perform calculations inline. For example, 2 + 2 equals `r 2 + 2`.

The Markdown block above demonstrated two things: static section headings and inline R code embedded inside a paragraph. The backtick-quoted `r 2 + 2` computes a single value and inserts it mid-sentence; the rendered output reads “2 + 2 equals 4” with the number generated at knit time. The code chunk below takes a different approach: instead of embedding a scalar into prose, it runs a standalone R expression (summary(cars)) that produces a block of printed output. Chunks suit multi-line computations, data summaries, and plots; inline expressions suit single numbers referenced in explanatory text. The rendered document places the chunk output as a separate formatted block immediately after your explanatory prose, preserving the narrative flow while keeping the computation verifiable.

summary(cars)

When rendered, the code chunk runs and its output appears directly below it in the document. Save the file as first_report.Rmd and click Knit. You will get an HTML document with the calculation and summary table embedded.

Adding and configuring code chunks

Code chunks are the engine of R Markdown. You insert them with the keyboard shortcut Ctrl+Alt+I (Windows/Linux) or Cmd+Option+I (Mac).

A code chunk looks like this:

# Your R code here
mean(cars$speed)

You can customize each chunk with options placed after {r}:

Option	What it does
`echo=FALSE`	Hide the code, show only results
`include=FALSE`	Run the code but hide both code and output
`results='hide'`	Show code but hide output
`message=FALSE`	Suppress messages
`warning=FALSE`	Suppress warnings
`fig.width=6`	Set figure width

This chunk hides the code but shows a plot:

plot(cars$speed, cars$dist, 
     xlab = "Speed", 
     ylab = "Distance",
     main = "Speed vs Stopping Distance")

Inline code

You do not always need a full code chunk. Use inline code to insert single values into your text. Wrap R expressions in backticks with r in front:

The average speed in the cars dataset is r mean(cars$speed) mph. (See the mean() reference for details on this base R function.)

When rendered, this becomes “The average speed in the cars dataset is 15.4 mph.” The number updates automatically if your data changes.

Creating tables

The knit package includes kable() for creating formatted tables. Load it in a chunk:

library(knitr)
kable(head(cars, 5), caption = "First five rows of the cars dataset")

The kable() function formats data frames as styled tables in the rendered output, turning raw row-and-column data (subset here with head()) into publication-ready display tables with captions, alignment, and output-format-specific styling. Once your table is set, the next decision is which output format to target, since HTML tables render differently from PDF tables. The YAML header’s output field is what controls that target; it tells rmarkdown::render() which output format engine to invoke and what styling defaults to apply. The three built-in formats cover the majority of use cases for sharing reports.

Output formats

The YAML header controls your output format. Each format produces a different file type with its own rendering pipeline and styling conventions. Common options include:

HTML document:

output: html_document

HTML is the default and most flexible output; it renders in any browser, supports interactive widgets, and produces a self-contained file you can email or host. It is the best choice for dashboards, data journalism, and internal reports shared online. PDF, by contrast, guarantees exact page fidelity regardless of the viewer or operating system, which matters when submitting to academic journals, regulatory bodies, or print publication workflows that require fixed pagination, embedded fonts, and reproducible layout down to the millimeter.

PDF document:

output: pdf_document

PDF output requires a LaTeX installation (discussed later in Common Problems) and produces a typeset document suitable for formal publication. If your collaborators need to edit the report after it is generated; for example, adding executive commentary or merging sections into a larger document; then Word output is the right choice. A .docx file gives non-R users the ability to open, annotate, and modify the content using Microsoft Word or LibreOffice, preserving the formatting while making the document editable in a familiar environment.

Microsoft Word:

output: word_document

Each format above used the shorthand YAML syntax: a single string value naming the output format. This works when you accept all defaults. For custom styling, the YAML header supports an expanded syntax where output becomes a nested object with sub-options specific to that format. The example below adds a table of contents, a Bootswatch theme, and syntax highlighting to an HTML document; three customizations that are impossible to express with the shorthand form. Recognizing when to graduate from the one-line output: html_document to the full nested configuration is the difference between a generic-looking report and one that matches your organization’s visual identity.

---
title: "My Report"
output:
  html_document:
    toc: true
    theme: united
    highlight: tango
---

toc: true adds a table of contents. theme changes the visual style. highlight controls code syntax coloring. The YAML header configures the output format declaratively at the file level, which works well for documents you open in RStudio and knit interactively. When you need to automate report generation outside of RStudio; for example, in a scheduled script, a CI/CD pipeline, or a Shiny app that produces downloadable reports; you switch from the YAML declaration to the programmatic render() function. The YAML header and render() function are two entry points to the same rendering engine: the header is the declarative path for interactive use, and render() is the imperative path for automation.

Rendering from the console

You do not need RStudio to render documents. Use the render() function:

rmarkdown::render("first_report.Rmd", 
                   output_format = "html_document")

The render() function worked because the required rmarkdown package was already available in the session. In a fresh R installation or a containerized environment, however, that assumption breaks; no packages are present by default. The first troubleshooting step is to verify that the packages R Markdown depends on are actually installed in the current R library path.

Common problems

Packages missing: Install the required packages first:

install.packages(c("rmarkdown", "knit"))

The first block solves a missing R package problem with a standard install.packages() call, which fetches from CRAN; this is the same workflow as installing any other R package. PDF rendering failures are a different class of issue entirely: the problem is not an absent R package but an absent system dependency. R Markdown’s PDF pipeline delegates typesetting to a LaTeX engine, and without one installed on the operating system, output: pdf_document will fail with an opaque error message about a missing executable. The tinytex package bridges this gap by installing a minimal LaTeX distribution (~100 MB) specifically configured for R Markdown’s needs, avoiding the multi-gigabyte full TeX Live installation that used to be the only option.

PDF fails: Rendering to PDF requires a LaTeX installation. The tinytex package provides a lightweight solution:

install.packages("tinytex")
tinytex::install_tinytex()

Code does not run: Check that your R code is inside a valid chunk marked with ```{r}. The backticks must be on their own lines.

Controlling figure output

When your code produces a plot, R Markdown includes it automatically. You can control the size and resolution with chunk options:

# Set dimensions for this specific plot
knitr::opts_chunk$set(fig.width = 8, fig.height = 5, dpi = 150)

The first approach calls knitr::opts_chunk$set() to establish global defaults that affect every subsequent chunk in the document. This is the right strategy when you want consistent figure dimensions across an entire report; set the dimensions once and every plot conforms without repeating configuration. The second approach applies options to a single chunk through its header line. Per-chunk options override the global defaults, so you can define a document-wide standard and selectively deviate only for specific figures that need different sizing. This two-tier configuration system (global defaults with per-chunk overrides) prevents repetitive option declarations while still allowing per-plot customization where it is genuinely needed.

# {r, fig.width=8, fig.height=5, dpi=150}
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  labs(title = "Car weight vs fuel efficiency")

For presentations, fig.align = "center" centers the image. (See ggplot2 extensions for advanced plotting techniques.) For HTML output, out.width = "80%" scales it relative to the page width. These options save you from editing exported images separately.

To suppress a figure entirely, set fig.show = "hide". This runs the code (so any side effects happen) but omits the image from the output.

Caching slow computations

If one chunk takes minutes to run, set cache = TRUE on it. R Markdown stores the result on disk and skips the computation on the next render if the chunk has not changed:

# {r slow-model, cache=TRUE}
model <- lm(mpg ~ ., data = mtcars)
summary(model)

The cache key is the chunk label and its code. If you change the code, the cache invalidates automatically. Be careful with chunks that depend on external data: if the data file changes but the chunk code does not, the cache will not update. In those cases, add cache.extra = file.mtime("data.csv") to tie the cache to the file’s modification time.

Clear all caches by deleting the <document>_cache/ directory that R Markdown creates next to your .Rmd file.

Writing reports that update automatically

One of the practical benefits of R Markdown over Word or Google Docs is that your report regenerates from source data. Set up your workflow so that rendering the document reads from a live data source with read.csv():

# Load fresh data on each render
df <- read.csv("monthly_sales.csv")
total <- [`sum()`](/reference/base-functions/sum/)(df$revenue)

The R code chunk above runs read.csv() and sum() to load data and compute a total, storing the result in the total variable within the knit session’s global environment. Every chunk in the document shares this environment, so any variable created in one chunk is available to subsequent chunks and inline expressions. The markdown block below uses inline R syntax (`r `) to embed that computed value directly into a sentence. Unlike the code chunk, which produces a standalone block of output, the inline expression injects a single formatted number into the flow of prose. The format() call wraps the numeric value with thousands-separator formatting, demonstrating that inline expressions support full function calls; they are not limited to bare variable names. The rendered output reads like natural language: “Total revenue this month: 15,432” rather than “Total revenue this month: 15432”.

Total revenue this month: `r format(total, big.mark=",")`

When the CSV updates next month, re-knitting the document produces a new report with the correct numbers. No copy-pasting, no stale figures. This pattern works well for weekly summaries, model evaluation reports, and dashboards that do not need a full Shiny app.

Common questions

Can I use R Markdown without RStudio? Yes. Install rmarkdown and knitr from CRAN (see Building R Packages for package management patterns), then call rmarkdown::render() from the terminal or any R session. RStudio adds convenience but is not required.

What is the difference between .Rmd and .qmd? Quarto uses .qmd files and extends R Markdown with support for Python, Julia, and Observable. The syntax is nearly identical. Most existing .Rmd files render unchanged in Quarto with minor frontmatter adjustments.

How do I share an R Markdown report? Render to HTML and share the file directly. For team sharing, publish to RPubs with one click from RStudio, or host the HTML on any static file server. PDF output is self-contained and works for formal documents.

What comes next

You have learned the basics of R Markdown. From here, explore these areas:

Parameterized reports; create documents that accept different inputs each time you render
R Markdown notebooks; interact with chunks individually in RStudio
Bookdown; build multi-page books and long-form documents
Quarto; the next generation of R Markdown with broader language support

R Markdown connects your analysis to your written conclusions. When your data updates, re-knitting the document updates everything.