Getting Started with brms
The brms package provides a powerful interface for fitting Bayesian regression models in R. Built on top of Stan, a state-of-the-art platform for Bayesian inference, brms allows you to specify complex statistical models using familiar R formula syntax. This tutorial walks you through fitting your first Bayesian models with brms.
What you’ll learn
This tutorial covers the key concepts and practical techniques for working with Getting Started with brms. By the end, you will know how to apply the core functions in real data analysis workflows.
What is brms?
brms stands for Bayesian Regression Models using Stan. It translates R formula notation into Stan code, handles the compilation and sampling, and returns results in a format that integrates smoothly with tidyverse workflows.
Key advantages of using brms:
- Familiar syntax: Use the same
~formula notation as lm() and glm() - Flexible models: Handle multilevel/hierarchical models, ordinal outcomes, censored data, and more
- Automatic MCMC: Stan handles Markov Chain Monte Carlo sampling under the hood
- Tidy output: Results play nicely with dplyr, tidybayes, and ggplot2
Installation
First, install brms and its dependencies:
# Install brms
install.packages("brms")
# Install Stan if not already installed
install.packages("rstan")
# Recommended packages for visualization
install.packages("bayesplot")
install.packages("tidybayes")
Loading the packages:
library(brms)
library(bayesplot)
library(tidybayes)
library(ggplot2)
library(dplyr)
Your first Bayesian model
Let’s start with a simple linear regression. We’ll use the mtcars dataset to predict miles per gallon (mpg) from weight (wt) and number of cylinders (cyl):
# Fit a Bayesian linear regression
model1 <- brm(
mpg ~ wt + cyl,
data = mtcars,
family = gaussian(),
chains = 4,
iter = 2000,
seed = 123
)
Breaking down the arguments:
mpg ~ wt + cyl: Formula specifying mpg as the outcome, with wt and cyl as predictorsdata: The data framefamily: The probability distribution for the outcome (gaussian for continuous outcomes)chains: Number of Markov chains to run (4 is standard)iter: Total iterations per chain (including warmup)seed: For reproducibility
Examining model results
The summary() function displays the core results:
summary(model1)
Output includes:
- Population-level effects: Coefficients for your predictors
- Sigma: The residual standard deviation
- Rhat: Convergence diagnostic (should be close to 1.0)
- ESS: Effective sample size (higher is better)
For a more tidy output:
fixef(model1)
This shows the posterior estimates for each coefficient:
Estimate Est.Error Q2.5 Q97.5
Intercept 39.69 1.96 35.80 43.63
wt -5.65 0.58 -6.78 -4.52
cyl -1.51 0.78 -3.04 -0.02
Interpretation:
- Estimate: Posterior mean (the most likely value)
- Est.Error: Standard deviation of the posterior
- Q2.5, Q97.5: 95% credible interval bounds
Interpreting credible intervals
The 95% credible interval [-6.78, -4.52] for wt means there’s a 95% probability that the true coefficient falls in this range, given the data and prior. This is fundamentally different from a confidence interval—it’s a direct statement about the parameter, not about hypothetical repeated samples.
# Plot posterior distributions
mcmc_plot(model1, type = "areas", pars = "^b_")
Visualizing posterior distributions
bayesplot provides excellent visualizations:
# Posterior density plots
mcmc_areas(model1, pars = c("b_wt", "b_cyl"),
prob = 0.95) +
labs(title = "Posterior Distributions of Coefficients")
Priors in brms
brms uses weakly informative priors by default. You can specify custom priors using the prior argument:
# Specify custom priors
model2 <- brm(
mpg ~ wt + cyl,
data = mtcars,
family = gaussian(),
prior = c(
prior(normal(0, 10), class = Intercept),
prior(normal(-3, 1), class = b, coef = "wt"),
prior(normal(0, 2), class = b, coef = "cyl"),
prior(exponential(1), class = sigma)
),
chains = 4,
seed = 123
)
Prior specifications:
class = Intercept: Prior for the interceptclass = b: Prior for all slope coefficientsclass = b, coef = "wt": Prior for specific coefficientclass = sigma: Prior for residual standard deviation
Bayesian logistic regression
brms supports many families. Here’s logistic regression for a binary outcome:
# Create a binary outcome: high vs low efficiency
mtcars$high_eff <- ifelse(mtcars$mpg > median(mtcars$mpg), 1, 0)
# Fit Bayesian logistic regression
model3 <- brm(
high_eff ~ wt + disp,
data = mtcars,
family = bernoulli(link = "logit"),
chains = 4,
seed = 123
)
summary(model3)
The interpretation changes slightly—coefficients are on the log-odds scale.
Multilevel/Hierarchical models
One of brms’ strengths is handling multilevel data. Here’s a simple random-intercepts model:
# Simulate multilevel data
set.seed(123)
n_schools <- 8
n_students <- 30
school_data <- data.frame(
school = rep(1:n_schools, each = n_students),
student = 1:n_students
)
school_data$school_effect <- rnorm(n_schools, 0, 2)[school_data$school]
school_data$ability <- rnorm(n_students, 0, 1)
school_data$score <- 50 + school_data$school_effect + school_data$ability + rnorm(nrow(school_data), 0, 3)
# Fit multilevel model
model_school <- brm(
score ~ ability + (1 | school),
data = school_data,
chains = 4,
seed = 123
)
The (1 | school) syntax adds a random intercept for each school.
Posterior predictions
Generate predictions from your model:
# Get posterior predictive samples
pp <- posterior_predict(model1, newdata = data.frame(wt = c(2, 3, 4), cyl = c(4, 6, 8)))
# pp is a matrix: rows = iterations, columns = new observations
dim(pp)
# [1] 4000 3
# Summarize predictions
apply(pp, 2, mean)
apply(pp, 2, quantile, probs = c(0.025, 0.975))
Checking convergence
Always verify that your MCMC chains converged properly:
# Trace plots
mcmc_trace(model1, pars = c("b_wt", "b_cyl"))
# Rhat values (should be < 1.05)
rhat(model1)
# Effective sample sizes
neff_ratio(model1)
If Rhat > 1.1 or effective sample sizes are low, you may need more iterations or better priors.
Model specification
brms uses R’s formula syntax extended with random effects and non-linear terms. bf(y ~ x + (1|group)) adds a random intercept per group. bf(y ~ x, sigma ~ x) allows the variance to vary with x. Non-linear models: bf(y ~ a * exp(-b * x), a ~ 1, b ~ 1, nl = TRUE). The formula interface handles a wide range of model structures without writing Stan code directly.
Running the model
brm(formula, data, family = gaussian(), prior = priors, chains = 4, iter = 2000, warmup = 1000) fits the model. chains = 4 runs 4 MCMC chains in parallel; cores = 4 enables parallel execution. iter = 2000 with warmup = 1000 gives 1000 post-warmup samples per chain (4000 total). brm() compiles a Stan model on the first run, subsequent runs reuse the compiled code and are faster.
Prior specification
get_prior(formula, data) shows the default priors brms would use. Set priors with set_prior("normal(0, 1)", class = "b") for regression coefficients, set_prior("exponential(1)", class = "sigma") for the standard deviation. set_prior("lkj(1)", class = "cor") sets the prior on correlation matrices. Prior predictive checks with sample_prior = "only" verify priors generate plausible data before fitting.
Posterior analysis
summary(fit) shows estimates, standard errors, and convergence diagnostics. plot(fit) produces trace plots and density plots per parameter. mcmc_plot(fit, type = "intervals") from bayesplot visualizes credible intervals. posterior_summary(fit) returns a data frame of all parameter summaries. as_draws_df(fit) converts the posterior to a data frame for custom analysis.
Why brms
brms (Bayesian Regression Models using Stan) translates R formula syntax into Stan programs, compiles them, and runs MCMC sampling. You write brm(y ~ x + (1 | group), data = df) and brms handles the Stan code, posterior sampling, and result extraction.
Without brms, fitting a Bayesian hierarchical model requires writing Stan code directly, hundreds of lines for a moderately complex model. brms generates efficient Stan code from a compact formula specification and provides a consistent interface for diagnostics, posterior extraction, and prediction.
brms supports: linear regression, generalized linear models (binomial, Poisson, negative binomial, beta, gamma), ordinal and count regression, zero-inflated models, hierarchical (mixed) models, multivariate models, distributional regression (modeling variance, shape, and other parameters alongside the mean), and non-linear models.
A first brms model
library(brms)
fit <- brm(
y ~ x1 + x2 + (1 + x1 | group),
data = df,
family = gaussian(),
prior = c(
prior(normal(0, 10), class = b),
prior(normal(0, 10), class = Intercept),
prior(exponential(1), class = sigma)
),
chains = 4,
iter = 2000,
warmup = 1000,
cores = 4
)
chains = 4 runs 4 independent Markov chains. iter = 2000 is the number of iterations per chain. warmup = 1000 is the adaptation phase (discarded). cores = 4 runs chains in parallel. The effective number of posterior samples is (iter - warmup) * chains = 4000.
Convergence diagnostics
Before interpreting results, check that the MCMC chains converged. summary(fit) shows the Rhat statistic for each parameter. Rhat near 1.0 indicates convergence; values above 1.05 suggest chains have not converged and longer sampling or better parameterization is needed.
plot(fit) shows trace plots (chains over iterations, should look like white noise) and density plots (posterior distributions). pp_check(fit) overlays the posterior predictive distribution with observed data, the most direct way to assess model fit.
bayesplot::mcmc_trace(as.array(fit)) shows trace plots. bayesplot::mcmc_rank_overlay(as.array(fit)) shows rank plots, which are better than trace plots for diagnosing mixing. Ideal rank plots are uniform across ranks.
Extracting results
fixef(fit) returns fixed effect estimates with credible intervals. ranef(fit) returns group-level deviations. posterior_summary(fit) shows all parameters. as_draws_df(fit) returns posterior samples as a tidy data frame.
conditional_effects(fit, effects = "x1") plots the posterior predictive mean and credible interval for each predictor, marginalizing over other predictors. This is the Bayesian analog of the partial dependence plot.
predict(fit, newdata = new_df, summary = FALSE) returns a matrix of posterior predictive samples, one row per posterior sample, one column per new observation. posterior_predict(fit, newdata) is equivalent. Use these for uncertainty quantification rather than just point predictions.
Model comparison
loo(fit1) computes Leave-One-Out cross-validation using Pareto-Smoothed Importance Sampling (PSIS-LOO). loo_compare(loo(fit1), loo(fit2)) compares models. The model with lower LOO-IC is preferred; differences larger than twice the standard error are meaningful.
bayes_factor(fit1, fit2) computes the Bayes factor, the ratio of marginal likelihoods. Requires save_pars = save_pars(all = TRUE) in the brm() call. Bayes factors greater than 10 provide strong evidence for one model over another.
Summary
You’ve learned the fundamentals of fitting Bayesian regression models with brms:
| Concept | Description |
|---|---|
| Formula syntax | Familiar R notation with ~ for regression |
| Families | gaussian, bernoulli, poisson, and many more |
| Priors | Default weakly informative or custom-specified |
| Credible intervals | Direct probability statements about parameters |
| MCMC diagnostics | Check Rhat, ESS, and trace plots |
The Bayesian approach gives you richer information than frequentist methods—you get full posterior distributions rather than single point estimates, allowing direct probability statements about your parameters.
Next steps
Continue building your Bayesian skills:
- Explore more families (poisson, ordinal, zero-inflated)
- Add random effects for nested and crossed designs
- Learn about posterior predictive model checking with bayesplot
- Master prior specification for your specific domains
See also
- Introduction to Bayesian Thinking, Core concepts of Bayesian inference
- Linear Regression in R — Frequentist foundation for understanding regression concepts