Logistic Regression in R

March 7, 2026 · 4 min read · Updated March 7, 2026 · intermediate

logistic-regression glm classification statistics machine-learning

Logistic regression is a fundamental classification technique used when the target variable is binary (0/1, TRUE/FALSE, yes/no). Unlike linear regression which predicts continuous values, logistic regression predicts probabilities between 0 and 1. In this tutorial, you’ll learn how to build, interpret, and evaluate logistic regression models in R.

When to Use Logistic Regression

Logistic regression is ideal when you need to:

Predict binary outcomes (will a customer churn or not?)
Understand the effect of predictors on odds of success
Get probability scores for ranking observations
Build a baseline model before trying more complex algorithms

The glm() Function in R

R’s base glm() function fits generalized linear models. For logistic regression, we set family = "binomial".

Basic Logistic Regression

Let’s fit a simple logistic regression model using the builtin mtcars dataset. We’ll predict whether a car has automatic (0) or manual (1) transmission based on miles per gallon and number of cylinders:

# Load data and prepare
data(mtcars)

# Convert am (transmission) to factor for clarity
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))

# Fit logistic regression model
model <- glm(am ~ mpg + cyl, data = mtcars, family = "binomial")

# View model summary
summary(model)

# Coefficients (log-odds)
coef(model)
# (Intercept)         mpg         cyl 
#   19.42588    -1.40911    -1.33045

The coefficients are in log-odds scale. To interpret them, we need to exponentiate.

Interpreting Coefficients

Odds Ratios

The exponentiated coefficients give us odds ratios:

# Calculate odds ratios
exp(coef(model))
#   (Intercept)          mpg          cyl 
# 2.968408e+08     0.245257     0.264375

# More readable: confidence intervals for odds ratios
exp(confint(model))
# Waiting for profiling the intervals... done
#                   2.5 %      97.5 %
# (Intercept)  2.036e+01  4.329e+15
# mpg          0.1429889   0.4147277
# cyl          0.1068283   0.6536044

Interpretation:

For every 1-unit increase in mpg, the odds of having a manual transmission multiply by 0.245 (decrease by 75%)
For every 1-unit increase in cylinders, the odds multiply by 0.264 (decrease by 74%)

Predicted Probabilities

Use predict() with type = "response" to get probabilities:

# Predicted probabilities for original data
predicted_probs <- predict(model, type = "response")

# Create data frame with actual and predicted
results <- data.frame(
  actual = mtcars$am,
  predicted_prob = predicted_probs,
  predicted_class = ifelse(predicted_probs > 0.5, "Manual", "Automatic")
)

head(results, 10)
#    actual predicted_prob predicted_class
# 1 Automatic      0.9650490        Automatic
# 2 Automatic      0.9650490        Automatic
# 3 Automatic      0.9607077        Automatic
# 4 Automatic      0.6656524        Automatic
# 5 Automatic      0.9866218        Automatic
# 6       Manual      0.1016491        Automatic
# ...

Model Evaluation

Confusion Matrix

# Create predicted class
predicted_class <- ifelse(predicted_probs > 0.5, "Manual", "Automatic")

# Confusion matrix
table(Predicted = predicted_class, Actual = mtcars$am)

# Calculate accuracy manually
mean(predicted_class == mtcars$am)
# [1] 0.8125

Using caret for Metrics

The caret package provides comprehensive evaluation:

library(caret)

# Confusion matrix with caret
confusionMatrix(as.factor(predicted_class), mtcars$am)

ROC Curve and AUC

Visualize model performance with an ROC curve:

library(pROC)

# Calculate ROC curve
roc_obj <- roc(mtcars$am, predicted_probs, levels = c("Automatic", "Manual"))

# Plot ROC curve
plot(roc_obj, main = "ROC Curve for Logistic Regression")

# Calculate AUC
auc(roc_obj)
# Area under the curve: 0.9375

An AUC of 0.94 indicates excellent discrimination.

Multiple Logistic Regression

In practice, you’ll include multiple predictors. Here’s a more complete example:

# Fit model with more predictors
model_full <- glm(am ~ mpg + cyl + disp + hp + wt, 
                  data = mtcars, 
                  family = "binomial")

# Compare models with AIC
AIC(model, model_full)
#         df      AIC
# model    4  9.036475
# model_full  7 13.684785

# The simpler model is better (lower AIC)

Step-by-Step Example: Customer Churn Prediction

Let’s walk through a complete example with simulated customer data:

# Create sample customer churn data
set.seed(123)
n <- 200
churn_data <- data.frame(
  tenure = runif(n, 0, 48),
  monthly_charge = runif(n, 30, 150),
  contract = sample(c("Month-to-Month", "One-Year", "Two-Year"), n, replace = TRUE),
  payment_method = sample(c("Credit Card", "Bank Transfer", "Electronic Check"), n, replace = TRUE)
)

# Simulate churn (more likely for month-to-month, higher charges, lower tenure)
churn_data$churn <- ifelse(
  runif(n) < 0.1 + 0.3 * (churn_data$contract == "Month-to-Month") + 
    0.002 * churn_data$monthly_charge - 0.01 * churn_data$tenure,
  1, 0
)

# Fit model
churn_model <- glm(churn ~ tenure + monthly_charge + contract + payment_method,
                   data = churn_data,
                   family = "binomial",
                   na.action = na.exclude)

# Check coefficients
summary(churn_model)$coefficients

Best Practices

Check for multicollinearity using car::vif() — high VIF values indicate problematic predictors
Assess linearity — logistic regression assumes a linear relationship between predictors and log-odds. Use Box-Tidwell test or visualize predictor vs. logit
Don’t overfit — AIC, cross-validation, or held-out test sets help prevent overfitting
Check for outliers — influential observations can distort coefficients
Consider regularization — for high-dimensional data, use glmnet with elastic net

Summary

Logistic regression in R is straightforward with glm(family = "binomial"). Key points:

Coefficients are in log-odds; exponentiate for odds ratios
Use type = "response" for predicted probabilities
Evaluate with confusion matrices, ROC curves, and AUC
Check assumptions before trusting results

With these fundamentals, you can build classification models for real-world prediction problems.