Prior Selection in Bayesian Models

March 16, 2026 · 11 min read ·Updated May 29, 2026 ·advanced

bayesianpriorbrmsstatisticsmcmcstan

Prior selection in Bayesian models is one of the most important — and often most confusing — aspects of Bayesian statistics. Your priors encode what you know or believe before seeing the data. Getting them right is crucial for both valid inference and communicating your assumptions. This tutorial teaches you how to choose priors thoughtfully for your Bayesian models using brms in R.

Why priors matter

In Bayesian inference, the prior represents your beliefs before observing data. Combined with the likelihood (what the data tell you), this gives you the posterior—your updated belief after seeing evidence.

The key insight is that priors matter most when:

You have limited data
You’re estimating parameters that aren’t well-identified by the likelihood
You want to incorporate external information

With abundant data, the likelihood dominates and the prior has less influence. But when data are sparse, your prior shapes the posterior substantially.

Types of priors

Uninformative priors

Uninformative priors attempt to let the data speak freely, adding minimal information. The idea is to make the prior as “flat” as possible so it doesn’t influence results. In brms, you specify priors using the prior() function with a distribution family and the parameter class:

# Common uninformative priors in brms
# Intercept: uniform prior across the real line
prior(normal(0, 1e6), class = Intercept)

# Coefficients: very wide normal
prior(normal(0, 1e4), class = b)

# Standard deviation: almost uniform on positive reals
prior(cauchy(0, 1e3), class = sd)

The problem with truly flat priors is that they’re often improper (don’t integrate to 1) and can lead to computational issues. More importantly, “uninformative” is a misnomer—these priors still encode assumptions about scale and support.

Weakly informative priors

Weakly informative priors are the workhorse of practical Bayesian analysis. They constrain parameters to plausible ranges without strong commitment to specific values. This helps MCMC sampler performance while allowing data to dominate when there’s enough of it. The following examples show typical weakly informative priors for common parameter types:

# Weakly informative prior for a regression coefficient
# Assuming predictor is on a reasonable scale (e.g., -3 to 3 for standardized data)
prior(normal(0, 2), class = b)

# For an intercept when you expect the outcome around 0-10
prior(normal(5, 5), class = Intercept)

# For a standard deviation
prior(exponential(1), class = sigma)

These priors say: “I don’t know the exact value, but it’s probably within a reasonable range.”

Informative priors

Informative priors incorporate specific knowledge — previous studies, domain expertise, or substantive theory. Use them when you have genuine prior information, such as a previously estimated effect size or a known biological constraint. Specify the distribution parameters to reflect the location and uncertainty of your prior knowledge:

# Previous study found effect of -3 with SD of 1
prior(normal(-3, 1), class = b, coef = "treatment")

# Domain knowledge: conversion rate likely between 1-10%
prior(beta(2, 28), class = Intercept)  # Mean = 2/30 is approximately 0.067

Prior selection workflow

Step 1: identify each parameter

For each parameter in your model, ask:

What scale is this parameter on?
What’s a reasonable range of values?
Do I have external information?

# Example model: predicting test scores
# score is approximately study_hours + sleep_hours + prior_grade

# Intercepts: expected score range 0-100
# study_hours: effect per hour (positive, likely 0-20 points)
# sleep_hours: effect per hour (positive, likely 0-10 points)
# prior_grade: baseline achievement

Step 2: choose distributions

Match the distribution to the parameter type:

Parameter Type	Common Prior	Rationale
Location (intercepts, slopes)	Normal	Symmetric, unbounded
Scale (SD, variance)	Exponential, Half-Normal	Must be positive
Probabilities	Beta	Bounded 0-1
Counts	Gamma, Poisson	Non-negative integers
Categorical	Dirichlet	Simplex (sums to 1)

Step 3: specify hyperparameters

The hyperparameters control the prior’s shape — specifically its location (mean) and scale (standard deviation). Narrower priors express stronger beliefs and constrain the posterior more; wider priors let the data exercise more influence. The following examples illustrate the probability mass covered by different prior widths:

# Narrow prior: strong belief
normal(0, 0.5)   # 95% of mass in [-1, 1]

# Moderate prior: some uncertainty
normal(0, 2)     # 95% of mass in [-4, 4]

# Wide prior: weak belief
normal(0, 10)    # 95% of mass in [-20, 20]

Practical prior specification in brms

Setting priors

Use the prior function to specify priors:

library(brms)

# Full prior specification
model <- brm(
  mpg ~ wt + cyl + hp,
  data = mtcars,
  family = gaussian(),
  prior = c(
    # Intercept: reasonable centered around mean outcome
    prior(normal(20, 10), class = Intercept),
    
    # Coefficients: weakly informative
    prior(normal(0, 5), class = b),
    
    # Residual SD: positive, exponential is standard
    prior(exponential(1), class = sigma)
  ),
  chains = 4,
  seed = 123
)

Viewing default priors

brms provides sensible defaults for most parameter types. Use get_prior() to inspect what priors the package would use if you don’t specify your own. The output shows each parameter’s class (Intercept, b for coefficients, sigma for residual SD) and the default distribution assigned to it:

# Get default priors for a model
get_prior(mpg ~ wt + cyl, data = mtcars, family = gaussian())

# Output shows:
#   prior     class      coef      group resp dpar nlpar bound
# 1           b                                   
# 2           b        cyl               
# 3           b        wt                
# 4 flat      Intercept                          
# 5  student_t(3, 0, 10) Intercept
# 6  exponential(1)         sigma

Priors for different parameter classes

You can assign priors at different levels of granularity. A class-level prior applies to every parameter of that type in the model. A coefficient-specific prior targets one predictor, and you can also assign priors to interaction terms and random effects:

# Class-level: applies to all parameters of that type
prior(normal(0, 2), class = b)

# Coefficient-specific: applies to one predictor
prior(normal(-2, 1), class = b, coef = "treatment")

# Interaction terms
prior(normal(0, 2), class = b, coef = "treatment:age")

# Random effects
prior(exponential(1), class = sd)
prior(normal(0, 2), class = sd, coef = "Intercept", group = "school")

Prior sensitivity analysis

Always check how sensitive your results are to prior choices:

# Fit with different priors
model_weak <- brm(mpg ~ wt, data = mtcars, 
                  prior = prior(normal(0, 10), class = b),
                  chains = 4, seed = 123)

model_strong <- brm(mpg ~ wt, data = mtcars,
                    prior = prior(normal(0, 1), class = b),
                    chains = 4, seed = 123)

# Compare posteriors
posterior_compare <- data.frame(
  weak = fixef(model_weak)[, "Estimate"],
  strong = fixef(model_strong)[, "Estimate"]
)
print(posterior_compare)

If conclusions change dramatically with different priors, you have insufficient data to draw strong conclusions. Report this honestly.

Common prior choices by use case

Regression coefficients (Standardized predictors)

# When predictors are z-scored (mean=0, sd=1)
prior(normal(0, 1), class = b)      # Weakly informative
prior(normal(0, 0.5), class = b)    # Stronger shrinkage toward zero

Regression coefficients (Original scale)

# Consider the range of your predictor
# If predictor ranges 0-100 and you expect small effects
prior(normal(0, 0.1), class = b, coef = "predictor")

Hierarchical/Multilevel models

# Random intercepts: half-normal on SD scale
prior(half_normal(0, 1), class = sd, group = "school")

# Random slopes
prior(lkj(2), class = cor)           # LKJ prior for correlations
prior(normal(0, 1), class = sd, coef = "", group = "school")

Binary outcomes (Logistic regression)

# Coefficients are on log-odds scale
# P(outcome=1) = inv_logit(Intercept + b*X)
# A 1-unit increase in X multiplies odds by exp(b)

# Weakly informative: effect likely between -5 and 5 on log-odds
prior(normal(0, 2), class = b)

Count data

# Poisson: counts are non-negative
prior(normal(0, 2), class = b)      # On log-scale
prior(gamma(0.1, 0.1), class = disp) # Dispersion

Troubleshooting prior issues

Symptom: chains don’t converge

# Try wider priors if parameters are hitting boundaries
prior(normal(0, 10), class = b)  # Instead of narrow prior

Symptom: all samples are near zero

# Your prior may be too narrow or centered wrong
# Check: prior normal(0, 0.01) is extremely narrow

Symptom: divergent transitions

# Often caused by extreme priors
# Try more moderate priors or increase adapt_delta
model <- brm(y ~ x, data = dat, 
             control = list(adapt_delta = 0.99))

Choosing priors

Prior selection reflects your knowledge before observing data. For regression coefficients with standardized predictors, normal(0, 1) is a weakly informative prior that shrinks toward zero without strong bias. For standard deviations, exponential(1) or half-normal(0, 1) constrain to positive values with reasonable scale. For probabilities, beta(1, 1) is uniform; beta(2, 2) is mildly regularizing toward 0.5.

Informative priors

Informative priors encode domain knowledge. If a drug typically has a positive effect of 2-5 units based on prior studies, normal(3.5, 1) is an informative prior on the effect size. Using informative priors improves estimation with small samples. Document the justification for informative priors in a methods section, prior choice is a modeling decision that should be transparent.

Why prior selection matters

The prior distribution represents your knowledge about a parameter before seeing the data. Choosing a prior is not optional in Bayesian analysis, even “uninformative” priors make assumptions. A flat prior on a regression coefficient (uniform from -∞ to +∞) is not uninformative: it places equal probability on coefficients of 1 and 1,000,000, which is not realistic for most applications.

Prior selection affects inference more when data is limited. With large samples, the likelihood dominates and most reasonable priors produce similar posteriors (Bernstein-von Mises theorem). With small samples, the prior strongly influences the posterior. Understanding this helps calibrate how much effort to invest in prior selection.

Weakly informative priors

The brms and rstanarm packages encourage weakly informative priors, priors that rule out implausible parameter values while remaining flexible enough not to dominate the posterior with modest data.

For regression coefficients on standardized predictors, normal(0, 2.5) is a common default. It rules out absurdly large effects while allowing realistic-to-large effects. student_t(3, 0, 2.5) (Student’s t with 3 degrees of freedom) has heavier tails, accommodating occasional large effects.

For standard deviations (which must be positive): half_normal(0, 1) or exponential(1). half_cauchy(0, 2.5) is used in the horseshoe prior for sparse regression. These distributions restrict to positive values while allowing flexibility.

Specifying priors in brms

brms::prior(normal(0, 10), class = "b") sets a normal prior on all population-level coefficients. prior(normal(0, 5), class = "b", coef = "x") sets a prior on a specific coefficient. prior(student_t(3, 0, 2.5), class = "Intercept") sets the intercept prior.

prior(exponential(1), class = "sigma") for the residual standard deviation. For random effects standard deviations: prior(half_normal(0, 1), class = "sd").

brms::get_prior(formula, data = df, family = gaussian()) lists all parameters that can have priors set and their default priors. Review this before fitting a model to understand what assumptions you are making implicitly.

Prior predictive checks

Before fitting a model, simulate data from the prior and check whether it produces plausible outcomes. This is called a prior predictive check and is the most direct way to evaluate whether your priors encode reasonable beliefs.

brms::brm(formula, data, prior = my_prior, sample_prior = "only") fits the model using only the prior, ignoring the data. brms::pp_check(prior_only_model, type = "hist") plots a histogram of simulated outcomes. Ask: do these simulated values look plausible? Are there values in the simulation that are clearly impossible (negative reaction times, human heights of 10 meters)?

If the prior predictive distribution includes many implausible values, tighten the prior. If it is overly concentrated around one value, loosen it.

Informative priors from domain knowledge

When domain knowledge is available, use it. A clinical trial studying a new drug treatment can use historical data on similar drugs to set a prior on the treatment effect. A time series model for retail demand can use seasonal patterns from previous years.

rstanarm::normal(location = 0.3, scale = 0.1) encodes a belief that the treatment effect is around 0.3 with uncertainty of ±0.2 (two standard deviations). This is more informative than a generic normal(0, 2.5).

The prior_summary(fitted_model) function in brms shows the priors used in a fitted model. Document priors in analysis reports, they are analytical choices that should be transparent and reproducible.

Sensitivity analysis

After fitting the model, check how sensitive the posterior is to the choice of prior. Re-fit with alternative priors and compare the posteriors. If the posteriors are very different, the data does not dominate the prior and your conclusions depend on prior assumptions, report this clearly.

bayesplot::ppc_dens_overlay() overlays the posterior predictive distribution with observed data. Comparing posterior distributions under different priors: bayesplot::mcmc_areas(list(model1 = draws1, model2 = draws2)) shows the marginal posteriors from both models on the same axis.

Summary

Prior selection is both an art and a science:

Prior Type	When to Use	Example
Uninformative	When you have abundant data and want minimal assumptions	normal(0, 1e6)
Weakly informative	General purpose; helps computation	normal(0, 2), exponential(1)
Informative	When you have external information	Previous study results, domain expertise

Key principles:

Be explicit: Write down what your prior means
Check priors: Use prior predictive checks
Test sensitivity: See how results change with different priors
Report honestly: Tell readers what prior you used and why

Next steps

After mastering prior selection:

Learn about prior predictive simulation in depth
Explore hierarchical priors for multilevel data
Discover reliable priors (t-distributions) for outlier-prone data
Read about prior elicitation from domain experts

Why priors matter

Types of priors

Uninformative priors

Weakly informative priors

Informative priors

Prior selection workflow

Step 1: identify each parameter

Step 2: choose distributions

Step 3: specify hyperparameters

Practical prior specification in brms

Setting priors

Viewing default priors

Priors for different parameter classes

Prior sensitivity analysis

Common prior choices by use case

Regression coefficients (Standardized predictors)

Regression coefficients (Original scale)

Hierarchical/Multilevel models

Binary outcomes (Logistic regression)

Count data

Troubleshooting prior issues

Symptom: chains don’t converge

Symptom: all samples are near zero

Symptom: divergent transitions

Choosing priors

Informative priors

Why prior selection matters

Weakly informative priors

Specifying priors in brms

Prior predictive checks

Informative priors from domain knowledge

Sensitivity analysis

Summary

Next steps

See also