rguides

Bayesian Modelling with rstanarm

Bayesian statistics offers a different way to think about data analysis. Instead of asking “what do the data tell us alone,” Bayesian methods ask “what do the data tell us given what we already believe?” The rstanarm package brings this powerful framework to R users through a familiar interface.

What makes Bayesian different

In classical (frequentist) statistics, you treat parameters as unknown fixed values and compute the probability of observing your data given those parameters. In Bayesian statistics, you treat parameters themselves as random variables with their own probability distributions.

You start with a prior distribution, your belief about a parameter before seeing the data. Then you combine this with the data through likelihood to get the posterior distribution, your updated belief after considering the evidence.

The formula is simple:

posterior ∝ likelihood × prior

This is Bayes’ theorem in action. The posterior becomes your new baseline for inference, and it serves as the prior for any future data you collect.

Getting started with rstanarm

First, install and load the package:

install.packages("rstanarm")
library(rstanarm)

rstanarm provides a interface to Stan, a powerful probabilistic programming language. The main functions mirror familiar R regression functions but add Bayesian interpretation.

Fitting your first model

Let’s work with a classic example, predicting fuel efficiency based on car weight:

# Load built-in data
data(mtcars)

# Fit a Bayesian linear regression
# Compare to: lm(mpg ~ wt, data = mtcars)
fit <- stan_glm(mpg ~ wt, data = mtcars, 
                prior = normal(0, 10), 
                prior_intercept = normal(30, 10),
                chains = 4, iter = 2000)

The syntax should look familiar, it’s almost identical to glm(). The key differences are the prior specifications and the output.

Understanding priors

Priors are where Bayesian analysis gets interesting. They encode your existing knowledge:

  • Weakly informative priors, say “I don’t know much, but it’s probably in this range”
  • Strong priors, express strong beliefs based on previous research
  • Flat priors, essentially say “I know nothing” (similar to maximum likelihood)

For the intercept, normal(30, 10) says “before seeing data, I think fuel efficiency is around 30 mpg, with standard deviation 10.” This is fairly vague, the data will dominate the final estimate.

For the coefficient, normal(0, 10) is a common default that says “I think the effect could be positive or negative, but large effects are less likely.”

Interpreting the posterior

Now let’s look at what the model gives us:

print(fit)

You’ll see point estimates (median, mean) and uncertainty intervals. The output looks different from lm(), instead of a single coefficient with a p-value, you get a distribution.

To get credible intervals (the Bayesian equivalent of confidence intervals):

posterior_interval(fit, prob = 0.9)

A 90% credible interval for the weight coefficient might be [-5.8, -3.2]. You can interpret this as: “Given our priors and the data, there’s a 90% probability that the true coefficient lies between -5.8 and -3.2.”

This is more intuitive than the frequentist interpretation, you’re making probability statements about the parameter, not about repeated sampling.

Checking model fit

rstanarm provides several diagnostics:

# Check convergence
plot(fit)

# Posterior predictive checks
pp_check(fit)

The plot() function shows trace plots, you want to see “fuzzy caterpillars” rather than systematic patterns. If chains converged properly, the plots should look like random noise.

Posterior predictive checks compare simulated data from your model to the actual data. If the real data falls within the simulated range, your model is reasonable.

Adding more complexity

You can fit more elaborate models:

# Multiple predictors
fit_full <- stan_glm(mpg ~ wt + cyl + hp, data = mtcars,
                     prior = normal(0, 5),
                     chains = 4)

# Generalized linear models
fit_poisson <- stan_glm(count ~ treatment, data = your_data,
                        family = poisson(),
                        prior = normal(0, 2))

Making predictions

Prediction works similarly to other R models:

# Point prediction
predict(fit, newdata = data.frame(wt = 3.0))

# Full posterior for predictions (includes uncertainty)
y_pred <- posterior_predict(fit, newdata = data.frame(wt = 3.0))
quantile(y_pred, c(0.5, 0.025, 0.975))

The key advantage: you get uncertainty in predictions automatically, including both parameter uncertainty and observation-level variation.

Why use Bayesian methods

Bayesian approaches shine when:

  • You have meaningful prior information to incorporate
  • You need interpretable probability statements
  • Working with hierarchical/nested data structures
  • Small sample sizes where priors add stability
  • You want to propagate uncertainty naturally through models

The computational cost is higher than maximum likelihood, Markov Chain Monte Carlo sampling takes time, but modern computers make this increasingly practical. The rstanarm package handles most of the computational complexity behind the scenes, letting you focus on model specification and interpretation rather than algorithm details.

One practical consideration: Bayesian models can take several minutes to fit, especially with many iterations or complex models. Plan accordingly when working with large datasets or hierarchical models.

Common pitfalls

Watch out for these common issues when starting with Bayesian modeling:

  • Uninformative priors that are too wide can lead to numerical instability
  • Not running enough iterations can result in biased estimates, always check convergence
  • Interpreting credible intervals incorrectly, they represent probability of the parameter, not repeated sampling
  • Forgetting that priors influence results, with small data, your priors matter a lot

Posterior predictive checks

After fitting a Bayesian model, posterior predictive checks verify that the model can generate data similar to what was observed. bayesplot::pp_check(model) overlays the observed distribution with distributions simulated from the posterior. A model that fits well produces simulated distributions that match the observed data. If the simulated data looks systematically different from the observed data, the model’s assumptions may be violated.

posterior_predict() generates draws from the posterior predictive distribution, new observations the model expects to see, given the fitted parameters. These draws are the Bayesian alternative to residual analysis: rather than asking how far observed values are from fitted values, you ask whether observed values are plausible draws from the predictive distribution.

Model comparison with loo

loo (Leave-One-Out cross-validation) is the standard Bayesian model comparison tool. loo::loo(model) computes the LOO information criterion, and loo::loo_compare(model1, model2) compares two models on this criterion. Unlike AIC and BIC (which use maximum likelihood estimates), LOO uses the full posterior distribution for each held-out point, producing a more reliable estimate of out-of-sample predictive performance.

A model with better LOO has better expected predictive accuracy on new data from the same process. Large differences in LOO (relative to the standard error of the difference) provide strong evidence for one model over another. Small differences suggest the models have similar predictive performance, and domain considerations should guide model choice.

Prior specification

Choosing priors is one of the key decisions in Bayesian modeling. rstanarm uses weakly informative default priors that regularize the fit without strongly influencing results when the data is informative. prior = normal(0, 2.5) for regression coefficients is the default, it assumes coefficients are within a few units of zero, which is appropriate for standardized predictors.

For informative priors based on previous studies or domain knowledge, specify prior = normal(mu, sigma) where mu is your prior belief about the coefficient and sigma reflects your uncertainty. Use prior_PD = TRUE to simulate from the prior predictive distribution before seeing the data — this checks whether your priors produce reasonable outcome distributions.

bayesplot::mcmc_trace() and bayesplot::mcmc_pairs() visualize the MCMC chains. Good mixing (chains moving around rapidly without getting stuck) and convergence (chains from different starting points reaching the same distribution) are necessary conditions for trustworthy posterior estimates. The Rhat statistic should be close to 1 for all parameters; values above 1.1 indicate non-convergence.

See also