Generalised Linear Models in R

· 5 min read · Updated March 13, 2026 · intermediate
r statistics glm regression modeling

Generalised Linear Models (GLMs) are the workhorse of statistical modeling when your response variable isn’t nicely normally distributed. If you’ve got binary outcomes (yes/no, success/failure), counts (number of events), or proportions, GLMs are your friend. In R, fitting a GLM takes just one line of code, but understanding what comes out the other end takes a bit more work—which is exactly what this guide covers.

What Are GLMs and When Should You Use Them?

Ordinary linear regression assumes your outcome is continuous and normally distributed. But what if you’re modeling whether a customer churns, how many clicks an ad gets, or the number of defects in a batch? Those don’t fit the normal assumption.

GLMs extend linear regression to handle non-normal response variables by combining three components:

  • Distribution family: The probability distribution of your response (binomial for binary, Poisson for counts, etc.)
  • Linear predictor: A linear combination of your predictors (like regular regression)
  • Link function: The mathematical bridge that connects the linear predictor to the expected value of the response

The key insight is that you model the expected value through a function, not directly. For logistic regression, you model the log-odds; for Poisson regression, you model the log of the expected count.

GLM Theory Brief

A GLM has three parts:

  1. Random component: The distribution of Y (your response) from the exponential family—binomial, Poisson, Gaussian, Gamma, etc.

  2. Systematic component: The linear predictor ( \eta = \beta_0 + \beta_1 X_1 + … + \beta_p X_p )

  3. Link function: ( g(\mu) = \eta ) — maps the expected value ( \mu ) to the linear predictor

Common link functions:

  • Identity link: ( \mu = \eta ) — just regular linear regression
  • Logit link: ( \log(\mu/(1-\mu)) = \eta ) — logistic regression
  • Log link: ( \log(\mu) = \eta ) — Poisson regression

The flexibility to choose both distribution and link function independently is what makes GLMs so powerful.

Logistic Regression for Binary Outcomes

When your outcome is binary (0/1, TRUE/FALSE, yes/no), logistic regression is the standard approach. You’re modeling the probability that ( Y = 1 ).

Here’s a working example with the classic iris dataset (converting it to binary for demonstration):

# Create binary outcome: setosa vs. not setosa
iris$is_setosa <- ifelse(iris$Species == "setosa", 1, 0)

# Fit logistic regression
logit_model <- glm(is_setosa ~ Sepal.Length + Sepal.Width,
                   data = iris,
                   family = binomial(link = "logit"))

# View model summary
summary(logit_model)

The output shows coefficients on the log-odds scale. To interpret them more intuitively, you’ll want odds ratios.

Poisson Regression for Count Data

When you’re counting events—customer purchases, website visits, defects—Poisson regression is usually the right tool. The log link ensures predicted counts stay positive.

# Simulate count data
set.seed(42)
n <- 200
count_data <- data.frame(
  visits = rpois(n, lambda = 5),
  ad_spend = runif(n, 0, 100),
  category = sample(c("A", "B", "C"), n, replace = TRUE)
)

# Fit Poisson regression
poisson_model <- glm(visits ~ ad_spend + category,
                     data = count_data,
                     family = poisson(link = "log"))

summary(poisson_model)

Model Diagnostics

Once you’ve fit a model, you need to check if it’s any good. GLM diagnostics focus on a few key areas:

Deviance

Deviance measures how well your model fits the data. Lower is better, but you need to compare models, not just look at the absolute value.

# Null deviance vs. residual deviance
logit_model$deviance
logit_model$null.deviance

Residuals

Several types of residuals help you spot problems:

# Pearson residuals
residuals(logit_model, type = "pearson")

# Deviance residuals
residuals(logit_model, type = "deviance")

# Plot residuals
plot(logit_model)

Look for patterns in the residuals plot—systematic patterns indicate model misspecification.

Overdispersion

Poisson models assume variance equals the mean. When variance exceeds mean (overdispersion), your standard errors will be too small. Check this:

# Calculate dispersion statistic
dispersion <- poisson_model$deviance / poisson_model$df.residual
dispersion

Overdispersion above 1.5 or 2 is a red flag. You might need negative binomial regression (see the MASS package) or a quasi-likelihood model.

Interpreting Coefficients

The raw coefficients aren’t always intuitive. Here’s how to make sense of them:

Odds Ratios (Logistic Regression)

Exponentiate coefficients to get odds ratios:

# Calculate odds ratios
odds_ratios <- exp(coef(logit_model))
odds_ratios

# With confidence intervals
exp(confint(logit_model))

An odds ratio of 1.5 means the odds of the outcome increase by 50% for a one-unit increase in that predictor. An odds ratio of 0.7 means the odds decrease by 30%.

Incidence Rate Ratios (Poisson Regression)

Same idea—exponentiate for interpretation:

# Calculate incidence rate ratios
irr <- exp(coef(poisson_model))
irr

A practical note: IRRs close to 1 mean the predictor has minimal effect. If you’re seeing IRRs of 0.99 vs 1.5, the practical impact matters more than statistical significance.

Conclusion

GLMs open up a huge range of modeling possibilities beyond ordinary least squares. The key is matching your distribution family and link function to your data type—binomial with logit for binary outcomes, Poisson with log for counts. Once you’ve fit a model, don’t skip diagnostics: check for overdispersion, examine residuals, and verify your model assumptions hold.

The glm() function in R handles all of this with a consistent interface. Swap out the family argument and you’re done. That’s the beauty of the GLM framework—same syntax, different statistical assumptions.

See Also