rguides

Introduction to Bayesian Thinking

Bayesian statistics offers a fundamentally different way of thinking about uncertainty. Instead of viewing probability as the long-run frequency of events, Bayesian thinking treats probability as a measure of belief that gets updated as new evidence arrives. This tutorial introduces you to the core concepts that form the foundation of Bayesian analysis.

What you’ll learn

This tutorial covers the key concepts and practical techniques for working with Introduction to Bayesian Thinking. By the end, you will know how to apply the core functions in real data analysis workflows.

What is Bayesian thinking?

At its heart, Bayesian thinking is about learning from evidence. You start with an initial belief (called a prior), you observe some data (the likelihood), and you update your belief to get a new, improved belief (the posterior).

This process mirrors how we naturally reason. If you believe it’s unlikely to rain today but you see dark clouds forming, you update your belief toward “it’s probably going to rain.” Bayesian statistics formalizes this intuitive process.

The mathematical heart of Bayesian inference is Bayes’ theorem:

P(θdata)=P(dataθ)×P(θ)P(data)P(\theta | data) = \frac{P(data | \theta) \times P(\theta)}{P(data)}

In words:

  • Posterior = (Likelihood × Prior) / Evidence
  • The posterior is your updated belief after seeing the data

Frequentist vs Bayesian: a simple example

Imagine you’re testing a new coin to see if it’s fair. A frequentist approach would flip the coin many times and ask: “What’s the probability of getting this result if the coin is fair?” A Bayesian would instead ask: “Given the results I observed, what do I now believe about the coin’s bias?”

Let’s see this in practice with R:

# Suppose we flip a coin 10 times and get 8 heads
# Is the coin fair?

# Frequentist approach: p-value calculation
# Probability of 8+ heads out of 10 if p=0.5
pbinom(7, size = 10, prob = 0.5, lower.tail = FALSE)
# [1] 0.0546875
# Not significant at alpha = 0.05

# But what do we actually believe about the coin?

The frequentist answer is binary: reject or don’t reject the null hypothesis. The Bayesian answer is more informative: here’s a distribution of plausible values for the coin’s bias.

Setting up your first Bayesian analysis

You’ll need some key packages:

# Install Bayesian packages
install.packages("rstanarm")    # Stan for Bayesian regression
install.packages("bayesplot")   # Visualization
install.packages("tidybayes")   # Tidy workflow

# Load them
library(rstanarm)
library(bayesplot)
library(tidybayes)

A simple coin flipping example

Let’s implement a complete Bayesian analysis from scratch:

# Observed data: 8 heads out of 10 flips
n_heads <- 8
n_flips <- 10

# Prior: Before seeing data, let's say we think the coin is probably fair
# We represent this with a Beta distribution
# Beta(2, 2) is a reasonable prior - centered at 0.5 but with some uncertainty

prior_alpha <- 2
prior_beta <- 2

# Likelihood: Binomial - what we expect to see for different values of p
# We're computing: P(data | p) propto p^heads * (1-p)^tails

# Posterior: Beta(prior_alpha + heads, prior_beta + tails)
posterior_alpha <- prior_alpha + n_heads
posterior_beta <- prior_beta + (n_flips - n_heads)

# What does our posterior look like?
posterior_mean <- posterior_alpha / (posterior_alpha + posterior_beta)
posterior_mean
# [1] 0.7142857

# We now believe the coin has about 71% probability of landing heads

The posterior Beta(10, 4) captures our updated belief. The mean is 10/(10+4) approx 0.71. This is a weighted average of our prior (0.5) and the observed data (0.8), with more weight on the data because we have more observations.

Visualizing the posterior

Let’s see what the prior and posterior look like:

library(ggplot2)

# Create a sequence of values for p (coin bias)
p_values <- seq(0, 1, length.out = 100)

# Calculate prior and posterior densities
prior_density <- dbeta(p_values, prior_alpha, prior_beta)
posterior_density <- dbeta(p_values, posterior_alpha, posterior_beta)

# Plot both
plot_data <- data.frame(
  p = p_values,
  prior = prior_density,
  posterior = posterior_density
)

ggplot(plot_data) +
  geom_line(aes(p, prior), linetype = "dashed", color = "blue") +
  geom_line(aes(p, posterior), color = "red") +
  geom_vline(xintercept = 0.5, color = "gray", linetype = "dotted") +
  labs(
    x = "Probability of Heads (p)",
    y = "Density",
    title = "Prior (dashed) vs Posterior (solid)"
  ) +
  theme_minimal()

The posterior is more concentrated (less uncertain) than the prior because we’ve observed data. The peak is around 0.7, suggesting the coin is likely biased toward heads.

Credible intervals

One of Bayesian statistics’ strengths is the credible interval—a range of plausible values for the parameter. Unlike confidence intervals, credible intervals have a straightforward interpretation:

# 95% credible interval from the posterior
qbeta(0.025, posterior_alpha, posterior_beta)  # Lower bound
# [1] 0.4126918

qbeta(0.975, posterior_alpha, posterior_beta) # Upper bound
# [1] 0.9126918

# Interpretation: There's a 95% probability that the true p
# lies between 0.41 and 0.91, given our data and prior.

This is intuitively meaningful: we’re 95% confident the true coin bias falls in this range. No frequentist hand-waving about repeated sampling.

What if you have stronger prior beliefs?

The prior represents what you believed before seeing the data. Different priors lead to different posteriors:

# Strong prior: Beta(100, 100) - very confident the coin is fair
strong_prior <- c(100, 100)
strong_posterior <- strong_prior + c(n_heads, n_flips - n_heads)

mean(strong_posterior) / sum(strong_posterior)
# [1] 0.5192308

# Weak prior: Beta(1, 1) - no prior belief (uniform)
weak_prior <- c(1, 1)
weak_posterior <- weak_prior + c(n_heads, n_flips - n_heads)

mean(weak_posterior) / sum(weak_posterior)
# [1] 0.8181818

With a strong prior, the data has less influence on the posterior. With a weak (uninformative) prior, the posterior is almost entirely driven by the data.

Making decisions with the posterior

The posterior gives you everything you need for decision-making:

# What's the probability the coin is biased toward heads (p > 0.5)?
pbeta(0.5, posterior_alpha, posterior_beta, lower.tail = FALSE)
# [1] 0.9598694

# That's about a 96% chance the coin favors heads!
# Would you bet on heads?

A more realistic example: estimating a rate

Suppose you’re observing website visitors and want to estimate the conversion rate. You’ve seen 15 conversions out of 500 visitors:

# Observed data
conversions <- 15
visitors <- 500

# Use a weakly informative prior: Beta(1, 1) is uniform
# Or Beta(2, 20) if we expect conversion rates around 10%
prior <- c(2, 20)

# Posterior
posterior <- prior + c(conversions, visitors - conversions)

# Posterior mean (estimated conversion rate)
posterior[1] / sum(posterior)
# [1] 0.03207547  # About 3.2%

# 95% credible interval
c(
  qbeta(0.025, posterior[1], posterior[2]),
  qbeta(0.975, posterior[1], posterior[2])
)
# [1] 0.01852356 0.05037876

The conversion rate is likely between 1.9% and 5%, which is practical information for business decisions.

Why learn Bayesian methods?

Bayesian statistics offers several advantages:

  1. Natural interpretation: Credible intervals mean exactly what you think they mean
  2. Incorporates prior knowledge: Use existing information in your analysis
  3. Handles complex models: MCMC makes many problems tractable
  4. Answers the questions you actually ask: “What’s the probability my hypothesis is true?”

Bayes’ theorem

Bayes’ theorem updates beliefs given new evidence: posterior ∝ likelihood × prior. The prior encodes knowledge before seeing data. The likelihood is the probability of the observed data given the parameter. The posterior is the updated belief about the parameter after observing data. As more data accumulates, the posterior concentrates around the true value regardless of the prior, the data overwhelms the prior.

Prior, likelihood, posterior

In R, a simple Bayesian update for a coin flip: the prior on fairness is beta(2, 2) (mildly informative, centered on 0.5). After observing 7 heads in 10 flips, the likelihood is binomial(10, 7). The posterior is beta(2+7, 2+3) = beta(9, 5). rbeta(10000, 9, 5) generates samples from this posterior. quantile(samples, c(0.025, 0.975)) gives the 95% credible interval.

Bayesian vs frequentist interpretation

A frequentist 95% confidence interval means: in 95% of repeated experiments, the interval will contain the true parameter. A Bayesian 95% credible interval means: given the data and prior, there is a 95% probability the parameter lies in this interval. The Bayesian statement is more intuitive but requires specifying a prior. For large datasets with weakly informative priors, the two approaches typically give similar intervals.

MCMC sampling

Most posterior distributions cannot be computed analytically. Markov Chain Monte Carlo (MCMC) approximates the posterior by drawing correlated samples that converge to the posterior distribution. brms and rstan use the NUTS sampler (a variant of Hamiltonian Monte Carlo). After sampling, summary(fit) shows parameter estimates, standard deviations, and convergence diagnostics. R-hat near 1.0 and effective sample size above 400 indicate good convergence.

What makes Bayesian statistics different

Bayesian statistics treats model parameters as random variables with probability distributions, not as fixed unknown values to be estimated. Before seeing data, you specify a prior distribution expressing your beliefs about the plausible range of parameter values. After seeing data, the prior is updated using the data’s likelihood through Bayes’ theorem, producing a posterior distribution that represents your updated beliefs.

This framework produces fundamentally different outputs from frequentist statistics. Instead of a single point estimate with a confidence interval (which has a specific and often misunderstood interpretation), Bayesian analysis produces a full posterior distribution over parameter values. You can make direct probability statements: “there is a 95% probability that the parameter is between these values”, a statement that a frequentist confidence interval cannot logically support.

Prior distributions in practice

Choosing a prior is the part of Bayesian analysis that has no frequentist equivalent and requires the most judgment. A flat prior (also called uninformative or diffuse) assigns equal probability to all parameter values. A weakly informative prior rules out physically impossible values while remaining largely determined by the data. An informative prior encodes substantial prior knowledge from theory, previous studies, or expert judgment.

Weakly informative priors are generally recommended for routine analysis. They prevent the posterior from being dominated by implausible values, negative counts, effect sizes larger than anything plausible in the field, without substantially constraining what the data can tell you. Using a standard normal prior on a standardized regression coefficient, for example, encodes the expectation that most effects are modest without preventing the data from finding large effects if they exist.

The posterior as the answer

The posterior distribution is the Bayesian answer to a parameter estimation question. For simple models with conjugate priors, the posterior has a closed-form expression. For complex models, the posterior is approximated through sampling, Markov Chain Monte Carlo generates draws from the posterior distribution, and summaries of those draws (means, quantiles, standard deviations) summarize the posterior.

Point summaries of the posterior, the mean, median, or mode, are analogous to frequentist point estimates. The highest density interval or credible interval is the region containing a specified probability. Unlike a frequentist confidence interval, the credible interval is a direct probability statement: there is a specified probability that the true parameter value lies in this region, conditional on the model and the data.

Summary

You’ve learned the core concepts of Bayesian thinking:

ConceptDescription
PriorYour belief before seeing data
LikelihoodHow probable the data is for different parameter values
PosteriorUpdated belief after incorporating the data
Credible intervalRange of plausible values for the parameter

The Bayesian workflow is straightforward: specify your prior, collect data, compute the posterior, and make decisions. In the next tutorial, you’ll learn how to use brms to fit Bayesian regression models in R.

Next steps

Continue your Bayesian journey with the next tutorials in this series:

  • Getting Started with brms — Fit your first Bayesian regression model
  • Prior Selection — Learn how to encode your prior knowledge
  • Posterior Predictive Checks — Validate your Bayesian model’s fit

You’ll soon see how Bayesian methods provide a flexible framework for answering complex statistical questions.

See also