Getting Started with R Markdown
R Markdown is a file format that lets you write narrative text, embed R code, and render the output as a polished document. It is the foundation of reproducible research in R, enabling you to create reports, presentations, dashboards, and even books that automatically update when your data changes.
In this tutorial, you will learn how to create your first R Markdown document, understand its structure, and render it to different output formats.
What you’ll learn
This tutorial covers the key concepts and practical techniques for working with Getting Started with R Markdown. By the end, you will know how to apply the core functions in real data analysis workflows.
Installing the required packages
R Markdown is part of the RStudio IDE, but you will need to install the rmarkdown package to render documents.
install.packages("rmarkdown")
install.packages("kable") # For nice tables
install.packages("ggplot2") # For examples
Creating your first R markdown document
In RStudio, go to File > New File > R Markdown. A dialog will appear where you can choose a title and output format. For now, select “Document” and leave the default output format as HTML.
This creates a new file with default content that demonstrates the key features of R Markdown:
---
title: "My First R Markdown"
author: "Your Name"
date: "2026-03-10"
output: html_document
---
{r setup, include=FALSE}
knitopts::opts_chunk$set(echo = TRUE)
## R Markdown
This is an R Markdown document. When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks.
{r cars}
summary(cars)
## Including Plots
You can also embed plots, for example:
{r pressure, echo=FALSE}
plot(pressure)
Understanding the file structure
An R Markdown document has three main components:
1. YAML header
The YAML header sits between the triple dashes --- at the top. It contains metadata like the title, author, date, and output format:
---
title: "My Report"
author: "Jane Doe"
date: "2026-03-10"
output: html_document
---
You can change the output format to pdf_document or word_document to render to PDF or Word. Each format has additional options you can specify.
2. text narrative
Regular text is written in Markdown, a lightweight markup language. You can format text with bold, italics, create bullet lists, add links, and structure content with headings:
# Heading 1
## Heading 2
### Heading 3
- Bullet point
- Another point
**Bold text** and *italic text*
3. R code chunks
Code chunks are enclosed in triple backticks with {r} after the opening backticks:
# Your R code here
summary(data)
You can name each chunk to help organize your document and make debugging easier. Chunk options control how the code and output appear:
echo=TRUE: Show the R code (default)echo=FALSE: Hide the code, show only outputinclude=FALSE: Run the code but hide both code and outputmessage=FALSE: Suppress messages from the codewarning=FALSE: Suppress warningsfig.widthandfig.height: Control plot dimensionsfig.cap: Add a caption to plotscache=TRUE: Cache chunk results for faster rendering
Global options
You can set options that apply to all chunks using opts_chunk$set() in a setup chunk at the beginning of your document:
{r setup, include=FALSE}
knit opts::opts_chunk$set(
echo = FALSE,
message = FALSE,
warning = FALSE,
fig.width = 6,
fig.height = 4
)
This is useful when you want a consistent look across your entire document.
Working with figures
R Markdown handles figures automatically. Any plot you create in a code chunk will be automatically included in the rendered output. You can customize figure placement and sizing:
{r pressure-plot, fig.width=8, fig.height=5, fig.cap="Pressure vs Temperature"}
plot(pressure)
The figure will be saved as an image file and embedded in your document. You can reference figures using their chunk names, which is helpful when writing about your visualizations.
Including data files
You can read external data files directly in your R Markdown document:
{r read-data}
data <- read.csv("data.csv")
summary(data)
This makes R Markdown particularly powerful for data analysis workflows, where your report automatically reflects the current state of your data.
Rendering your document
To render an R Markdown document:
- Click the Knit button in RStudio, or
- Use the
render()function in R:
rmarkdown::render("my-document.Rmd")
R Markdown will execute each code chunk in sequence, insert the output into the document, and then convert the entire file to your chosen output format.
Adding a table
The kable() function from the knitr package creates nice-looking tables:
library(ggplot2)
library(kable)
mtcars[1:5, 1:4] |>
kable(caption = "First five rows of mtcars")
Embedding inline code
You can embed R code directly in your narrative text using `r code`. This is useful for inserting calculated values:
The dataset has `r nrow(mtcars)` observations.
This renders as: The dataset has 32 observations.
Document structure
An R Markdown document has three parts. The YAML header at the top configures metadata and output options. The Markdown body contains prose text with formatting. Code chunks (fenced with ```{r}) contain R code that executes when the document renders.
---
title: "My Report"
author: "Name"
date: today
output: html_document
---
knitr::opts_chunk$set(echo = FALSE, warning = FALSE) in the first chunk sets global options, hiding code and warnings from the output is standard for reports.
Inline code
`r expression` embeds R results inline in prose: “The mean is r mean(x) units.” This updates automatically when data changes. For numeric formatting: `r format(1234567, big.mark = ",")` renders as “1,234,567”. Using inline R for numbers in text prevents manual copy-paste errors.
Rendering
rmarkdown::render("report.Rmd") renders from R. The Knit button in RStudio does the same. Render to a different format: render("report.Rmd", output_format = "pdf_document"). bookdown::render_book() renders multi-file books. For automated pipelines, render() accepts parameter lists that override YAML params.
Document structure and YAML
An R Markdown document has three sections: YAML front matter between --- delimiters, markdown text, and R code chunks delimited by ```{r} and ```. The YAML controls metadata and rendering options.
Common YAML fields: title, author, date, output. The output field selects the renderer and its options. html_document: toc: true adds a table of contents. pdf_document: fig_caption: true enables figure captions. Multiple output formats can be specified; rmarkdown::render("doc.Rmd", output_format = "all") renders all.
date: "r Sys.Date()" (with backticks in the YAML) inserts the current date at render time. R expressions in YAML execute in the R environment, so you can use any R code that produces a scalar string.
Code chunk options
Chunk options control execution and output. Specify them in the chunk header: ```{r chunk-name, echo=FALSE, fig.width=8}.
Key options:
echo = FALSE: hide code, show outputeval = FALSE: show code, do not run itinclude = FALSE: run code, hide both code and outputresults = "hide": run code, show code, hide printed outputmessage = FALSE,warning = FALSE: suppress R messages and warningscache = TRUE: cache results; re-run only if code changesfig.width,fig.height: plot dimensions in inchesout.width = "80%": output width in the final document
Set global defaults with knitr::opts_chunk$set(echo = FALSE, message = FALSE) in a setup chunk at the beginning of the document.
Inline R code
`r expression` evaluates R inline and inserts the result into text. The mean is `r mean(x)`. renders as “The mean is 42.3.” This keeps numbers in sync with the data, change the data, re-render, and the numbers in text update automatically.
format(number, digits = 2, big.mark = ",") formats numbers for display. scales::dollar(value) formats as currency. Always format numbers in inline code for consistent presentation.
Output formats
html_document renders to a standalone HTML file. Options include theme (Bootswatch themes), highlight (code syntax highlighting), toc_float: true (floating sidebar table of contents), and code_folding: "hide" (collapsible code blocks).
pdf_document uses LaTeX via pandoc. Install a LaTeX distribution (tinytex::install_tinytex() provides a minimal installation). The latex_engine option selects xelatex (supports Unicode and custom fonts) or pdflatex (faster, ASCII only).
word_document renders to a .docx file. Provide a reference_doc Word file whose styles Quarto will apply. This is the path to brand-consistent Word output.
rmarkdown::render() is the function that knits the Rmd to output. RStudio’s “Knit” button calls this. In automated pipelines, call render() directly from a script or target.
Parameterized reports
params in the YAML defines parameters with defaults:
params:
region: "North"
year: 2024
In the document body, access with params$region and params$year. rmarkdown::render("report.Rmd", params = list(region = "South", year = 2025)) renders with custom parameter values without editing the file.
Loop over parameters to generate many reports: purrr::walk(regions, ~ rmarkdown::render("report.Rmd", params = list(region = .x), output_file = paste0("report_", .x, ".html"))).
Caching
cache = TRUE stores chunk results in a cache directory. On subsequent renders, cached chunks are skipped unless their code or dependencies change. dependson = "chunk-name" marks a chunk as dependent on another — if the upstream chunk changes, the downstream cache is invalidated.
Large data loading chunks benefit most from caching. Mark the data loading chunk with cache = TRUE and all analysis chunks with dependson = "load-data" to ensure they re-run when the data changes but not otherwise.
knitr::clean_cache() deletes the cache directory. Do this before a final render to ensure all results are freshly computed.
Working with tables
R Markdown renders data frames automatically in HTML output as basic HTML tables. For styled tables, use knitr::kable(df) which produces clean markdown tables. In HTML output, knitr::kable(df, format = "html") %>% kableExtra::kable_styling() adds Bootstrap-styled HTML.
gt::gt(df) produces publication-quality HTML tables with headers, column formatting, and footnotes. In HTML output, embed directly. In PDF output, gt::as_latex(gt_obj) converts.
For interactive tables in HTML output, DT::datatable(df, options = list(pageLength = 25, searchHighlight = TRUE)) adds sorting, filtering, and pagination.
Table captions: knitr::kable(df, caption = "Summary statistics") adds a caption. In cross-referenced documents, label the chunk #| label: tbl-summary and reference with @tbl-summary.
Troubleshooting common issues
When the document fails to knit, read the error in the Build pane carefully. The error shows which chunk failed and why. Common issues: a package not installed (Error: there is no package called 'ggplot2'), a file path relative to the wrong working directory, or a chunk that depends on an object created in a previous session that no longer exists.
Knitting always runs in a fresh R session — objects in your global environment are not available. If an analysis works interactively but fails on knit, something in the document depends on a global variable. Add the missing setup to a chunk in the document.
knitr::opts_knit$set(root.dir = here::here()) in a setup chunk sets the working directory to the project root, preventing file path issues when the document is not in the root directory.
For plots that render interactively but fail on knit: graphical parameters like par(mfrow) persist across chunks unless reset. A chunk that modifies par() without restoring may affect subsequent chunks. Use on.exit(par(old_par)) inside custom plotting functions.
Literate programming best practices
Separate data loading and analysis. A setup chunk loads data once; analysis chunks reference the loaded objects. Cache the data loading chunk: cache = TRUE, cache.extra = file.info("data.csv")$mtime re-runs the chunk only when the file changes.
Structure the document like a report, not like code. The narrative should explain the analysis to a reader who does not read the code. Use echo = FALSE for chunks that produce output the reader cares about but does not need to see implemented. Use inline R code for numbers that appear in prose.
Write documents to be reproducible on a clean machine. Document required package versions with sessionInfo() in a final chunk. Use renv::snapshot() to pin package versions. A document that requires undocumented manual setup steps loses its value as a reproducible report.
R markdown as a reproducibility tool
The core value proposition of R Markdown is reproducibility. A document that embeds computation produces the same output from the same inputs, every time, for anyone with the same software. This is fundamentally different from a Word document with numbers typed in manually, where the provenance of each number is unknown.
Reproducibility has practical consequences for data analysts. When you receive new data and need to update a quarterly report, re-rendering the R Markdown takes seconds. When a colleague questions a number in a report, you can audit it by reading the code that produced it. When you return to an analysis six months later, the document itself explains what was done.
The discipline required to write reproducible documents — all data loaded from files, all numbers computed from data, all visualizations generated by code — also produces better analysis. Working this way forces you to be explicit about data provenance, transformation steps, and modeling assumptions. Ambiguity that is invisible in a manually-assembled report becomes a bug in a reproducible document.
The main practical challenge is that R Markdown documents run in isolated sessions — they cannot depend on objects in your global environment. This constraint is actually a feature: it ensures the document is self-contained and does not depend on interactive state that exists only on your machine. Testing reproducibility is simple: restart R and re-render.
Integrating external data sources
Most real reports pull data from external sources: databases, APIs, files on shared drives. For each source, the R Markdown document should contain the connection and query code. Credentials go in environment variables, not in the document. The data load should be cached to avoid repeated slow queries.
For database sources: establish the connection in a setup chunk, perform queries in data-loading chunks with caching, close the connection in a cleanup chunk. The DBI interface works with any database, and the connection code is minimal enough to include in the document without cluttering the analysis.
For API sources: the httr2 request code is typically two to five lines. Store the response in a cached chunk. If the API has rate limits, caching is essential — re-rendering without cache would make repeated API calls.
For file sources: use relative paths with the here package to ensure portability. Absolute paths like “/Users/alice/Documents/data.csv” fail on every other machine. Relative paths with here() work anywhere the project is checked out.
The ideal structure: a data layer (load and cache), a preparation layer (clean and transform), and an analysis layer (compute and visualize). Separating these three concerns makes troubleshooting straightforward — data problems are in the data layer, logic errors are in the preparation layer, and incorrect calculations are in the analysis layer.
Next steps
Now that you have created your first R Markdown document, you can explore:
- Different output formats: Try
output: pdf_documentoroutput: word_document - Parameterised reports: Create reports that accept inputs using the
paramsfield in the YAML - R Notebooks: Use
output: html_notebookfor an interactive notebook experience
Continue to the next tutorial in this series to learn about Getting Started with Quarto.