Prior Selection in Bayesian Models

March 16, 2026 · 7 min read · Updated March 16, 2026 · advanced

bayesian prior brms statistics mcmc stan

Prior selection is one of the most important—and often most confusing—aspects of Bayesian statistics. Your priors encode what you know (or believe) before seeing the data. Getting them right is crucial for both valid inference and communicating your assumptions. This tutorial teaches you how to choose priors thoughtfully for your Bayesian models.

Why Priors Matter

In Bayesian inference, the prior represents your beliefs before observing data. Combined with the likelihood (what the data tell you), this gives you the posterior—your updated belief after seeing evidence.

The key insight is that priors matter most when:

You have limited data
You’re estimating parameters that aren’t well-identified by the likelihood
You want to incorporate external information

With abundant data, the likelihood dominates and the prior has less influence. But when data are sparse, your prior shapes the posterior substantially.

Types of Priors

Uninformative Priors

Uninformative priors attempt to let the data speak freely, adding minimal information. The idea is to make the prior as “flat” as possible so it doesn’t influence results.

# Common uninformative priors in brms
# Intercept: uniform prior across the real line
prior(normal(0, 1e6), class = Intercept)

# Coefficients: very wide normal
prior(normal(0, 1e4), class = b)

# Standard deviation: almost uniform on positive reals
prior(cauchy(0, 1e3), class = sd)

The problem with truly flat priors is that they’re often improper (don’t integrate to 1) and can lead to computational issues. More importantly, “uninformative” is a misnomer—these priors still encode assumptions about scale and support.

Weakly Informative Priors

Weakly informative priors are the workhorse of practical Bayesian analysis. They constrain parameters to plausible ranges without strong commitment to specific values. This helps MCMC sampler performance while allowing data to dominate when there’s enough of it.

# Weakly informative prior for a regression coefficient
# Assuming predictor is on a reasonable scale (e.g., -3 to 3 for standardized data)
prior(normal(0, 2), class = b)

# For an intercept when you expect the outcome around 0-10
prior(normal(5, 5), class = Intercept)

# For a standard deviation
prior(exponential(1), class = sigma)

These priors say: “I don’t know the exact value, but it’s probably within a reasonable range.”

Informative Priors

Informative priors incorporate specific knowledge—previous studies, domain expertise, or substantive theory. Use them when you have genuine prior information.

# Previous study found effect of -3 with SD of 1
prior(normal(-3, 1), class = b, coef = "treatment")

# Domain knowledge: conversion rate likely between 1-10%
prior(beta(2, 28), class = Intercept)  # Mean = 2/30 is approximately 0.067

Prior Selection Workflow

Step 1: Identify Each Parameter

For each parameter in your model, ask:

What scale is this parameter on?
What’s a reasonable range of values?
Do I have external information?

# Example model: predicting test scores
# score is approximately study_hours + sleep_hours + prior_grade

# Intercepts: expected score range 0-100
# study_hours: effect per hour (positive, likely 0-20 points)
# sleep_hours: effect per hour (positive, likely 0-10 points)
# prior_grade: baseline achievement

Step 2: Choose Distributions

Match the distribution to the parameter type:

Parameter Type	Common Prior	Rationale
Location (intercepts, slopes)	Normal	Symmetric, unbounded
Scale (SD, variance)	Exponential, Half-Normal	Must be positive
Probabilities	Beta	Bounded 0-1
Counts	Gamma, Poisson	Non-negative integers
Categorical	Dirichlet	Simplex (sums to 1)

Step 3: Specify Hyperparameters

The hyperparameters control the prior’s shape:

# Narrow prior: strong belief
normal(0, 0.5)   # 95% of mass in [-1, 1]

# Moderate prior: some uncertainty
normal(0, 2)     # 95% of mass in [-4, 4]

# Wide prior: weak belief
normal(0, 10)    # 95% of mass in [-20, 20]

Practical Prior Specification in brms

Setting Priors

Use the prior function to specify priors:

library(brms)

# Full prior specification
model <- brm(
  mpg ~ wt + cyl + hp,
  data = mtcars,
  family = gaussian(),
  prior = c(
    # Intercept: reasonable centered around mean outcome
    prior(normal(20, 10), class = Intercept),
    
    # Coefficients: weakly informative
    prior(normal(0, 5), class = b),
    
    # Residual SD: positive, exponential is standard
    prior(exponential(1), class = sigma)
  ),
  chains = 4,
  seed = 123
)

Viewing Default Priors

brms provides sensible defaults. Check them with:

# Get default priors for a model
get_prior(mpg ~ wt + cyl, data = mtcars, family = gaussian())

# Output shows:
#   prior     class      coef      group resp dpar nlpar bound
# 1           b                                   
# 2           b        cyl               
# 3           b        wt                
# 4 flat      Intercept                          
# 5  student_t(3, 0, 10) Intercept
# 6  exponential(1)         sigma

Priors for Different Parameter Classes

# Class-level: applies to all parameters of that type
prior(normal(0, 2), class = b)

# Coefficient-specific: applies to one predictor
prior(normal(-2, 1), class = b, coef = "treatment")

# Interaction terms
prior(normal(0, 2), class = b, coef = "treatment:age")

# Random effects
prior(exponential(1), class = sd)
prior(normal(0, 2), class = sd, coef = "Intercept", group = "school")

Prior Predictive Checks

Before seeing data, simulate from your prior to ensure it generates plausible values:

# Simulate from priors only (no data)
prior_sample <- brm(
  mpg ~ wt + cyl,
  data = mtcars,
  family = gaussian(),
  prior = c(
    prior(normal(20, 10), class = Intercept),
    prior(normal(0, 5), class = b),
    prior(exponential(1), class = sigma)
  ),
  sample_prior = "only",  # Don't use data!
  chains = 4,
  seed = 123
)

# Check predicted values from prior
pp_check(prior_sample, type = "hist")

If your prior predictive distribution contains impossible values (negative mpg, for example), adjust your priors.

Prior Sensitivity Analysis

Always check how sensitive your results are to prior choices:

# Fit with different priors
model_weak <- brm(mpg ~ wt, data = mtcars, 
                  prior = prior(normal(0, 10), class = b),
                  chains = 4, seed = 123)

model_strong <- brm(mpg ~ wt, data = mtcars,
                    prior = prior(normal(0, 1), class = b),
                    chains = 4, seed = 123)

# Compare posteriors
posterior_compare <- data.frame(
  weak = fixef(model_weak)[, "Estimate"],
  strong = fixef(model_strong)[, "Estimate"]
)
print(posterior_compare)

If conclusions change dramatically with different priors, you have insufficient data to draw strong conclusions. Report this honestly.

Common Prior Choices by Use Case

Regression Coefficients (Standardized Predictors)

# When predictors are z-scored (mean=0, sd=1)
prior(normal(0, 1), class = b)      # Weakly informative
prior(normal(0, 0.5), class = b)    # Stronger shrinkage toward zero

Regression Coefficients (Original Scale)

# Consider the range of your predictor
# If predictor ranges 0-100 and you expect small effects
prior(normal(0, 0.1), class = b, coef = "predictor")

Hierarchical/Multilevel Models

# Random intercepts: half-normal on SD scale
prior(half_normal(0, 1), class = sd, group = "school")

# Random slopes
prior(lkj(2), class = cor)           # LKJ prior for correlations
prior(normal(0, 1), class = sd, coef = "", group = "school")

Binary Outcomes (Logistic Regression)

# Coefficients are on log-odds scale
# P(outcome=1) = inv_logit(Intercept + b*X)
# A 1-unit increase in X multiplies odds by exp(b)

# Weakly informative: effect likely between -5 and 5 on log-odds
prior(normal(0, 2), class = b)

Count Data

# Poisson: counts are non-negative
prior(normal(0, 2), class = b)      # On log-scale
prior(gamma(0.1, 0.1), class = disp) # Dispersion

Troubleshooting Prior Issues

Symptom: Chains Don’t Converge

# Try wider priors if parameters are hitting boundaries
prior(normal(0, 10), class = b)  # Instead of narrow prior

Symptom: All Samples Are Near Zero

# Your prior may be too narrow or centered wrong
# Check: prior normal(0, 0.01) is extremely narrow

Symptom: Divergent Transitions

# Often caused by extreme priors
# Try more moderate priors or increase adapt_delta
model <- brm(y ~ x, data = dat, 
             control = list(adapt_delta = 0.99))

Summary

Prior selection is both an art and a science:

Prior Type	When to Use	Example
Uninformative	When you have abundant data and want minimal assumptions	normal(0, 1e6)
Weakly informative	General purpose; helps computation	normal(0, 2), exponential(1)
Informative	When you have external information	Previous study results, domain expertise

Key principles:

Be explicit: Write down what your prior means
Check priors: Use prior predictive checks
Test sensitivity: See how results change with different priors
Report honestly: Tell readers what prior you used and why

Next Steps

After mastering prior selection:

Learn about prior predictive simulation in depth
Explore hierarchical priors for multilevel data
Discover robust priors (t-distributions) for outlier-prone data
Read about prior elicitation from domain experts