How to fit a linear model with lm() in R
· 4 min read · Updated March 15, 2026 · beginner
r statistics regression linear-model
The lm() function is R’s workhorse for fitting linear models. Whether you’re running simple linear regression or multiple regression, lm() provides a consistent interface for modeling relationships between variables.
Basic Usage
Simple Linear Regression
Fit a model predicting one variable from another:
# Create sample data
df <- data.frame(
height = c(150, 160, 170, 180, 190, 155, 165, 175, 185, 195),
weight = c(55, 65, 75, 85, 95, 58, 68, 78, 88, 98)
)
# Fit simple linear regression
model <- lm(weight ~ height, data = df)
# View model summary
summary(model)
# Call:
# lm(formula = weight ~ height, data = df)
#
# Coefficients:
# (Intercept) height
# -95.000 1.000
# [1] -95
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -95.0000 4.0825 -23.27 1.25e-09 ***
# height 1.0000 0.0233 42.91 1.05e-14 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 1.291 on 8 degrees of freedom
# Multiple R-squared: 0.9958, Adjusted R-squared: 0.9948
# F-statistic: 1841 on 1 and 8 DF, p-value: 1.053e-14
The coefficient interpretation: for every 1-unit increase in height, weight increases by 1 unit.
Making Predictions
Use the model to predict new values:
# Predict for new heights
new_data <- data.frame(height = c(172, 180, 188))
predict(model, newdata = new_data)
# 1 2 3
# 77.000 85.000 93.000
Multiple Linear Regression
Include multiple predictors:
# More complex dataset
df <- data.frame(
weight = c(55, 65, 75, 85, 95, 58, 68, 78, 88, 98),
height = c(150, 160, 170, 180, 190, 155, 165, 175, 185, 195),
age = c(25, 30, 35, 40, 45, 22, 28, 32, 38, 42)
)
# Fit with multiple predictors
model <- lm(weight ~ height + age, data = df)
summary(model)
Interaction Terms
Model interactions between variables:
# Include interaction (height * age)
model_interaction <- lm(weight ~ height * age, data = df)
summary(model_interaction)
This tests whether the effect of height on weight depends on age.
Model Objects
Extracting Model Information
model <- lm(weight ~ height, data = df)
# Coefficients
coef(model)
# (Intercept) height
# -95 1
# Residuals
residuals(model)
# 1 2 3 4 5 6 7
# 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 -2.220e-16 0.000e+00
# 8 9 10
# 0.000e+00 0.000e+00 0.000e+00
# Fitted values
fitted(model)
# 1 2 3 4 5 6 7 8 9 10
# 55.000 65.000 75.000 85.000 95.000 60.000 70.000 80.000 90.000 100.000
# R-squared
summary(model)$r.squared
# [1] 0.9957576
# Adjusted R-squared
summary(model)$adj.r.squared
# [1] 0.9947774
Model Formula Syntax
| Formula | Meaning |
|---|---|
| y ~ x | y predicted by x |
| y ~ x + z | Multiple regression |
| y ~ x * z | Main effects + interaction |
| y ~ . | All available predictors |
| y ~ x - 1 | No intercept (through origin) |
| y ~ poly(x, 2) | Polynomial (quadratic) |
Model Diagnostics
Visual Diagnostics
Four key plots:
model <- lm(weight ~ height, data = df)
# Standard diagnostic plots
par(mfrow = c(2, 2))
plot(model)
- Residuals vs Fitted — Check for non-linearity
- Normal Q-Q — Check normality of residuals
- Scale-Location — Check homoscedasticity
- Residuals vs Leverage — Identify influential points
Statistical Tests
# Test for normality of residuals
shapiro.test(residuals(model))
# Shapiro-Wilk normality test
#
# data: residuals(model)
# W = 0.98585, p-value = 0.9783
# Breusch-Pagan test for heteroscedasticity
library(lmtest)
bptest(model)
# studentized Breusch-Pagan test
#
# data: model
# BP = 0.012521, df = 1, p-value = 0.9109
Working with Categorical Variables
R automatically creates dummy variables:
df <- data.frame(
weight = c(55, 65, 75, 85, 95, 58, 68, 78, 88, 98),
height = c(150, 160, 170, 180, 190, 155, 165, 175, 185, 195),
gender = c("M", "M", "M", "M", "M", "F", "F", "F", "F", "F")
)
model <- lm(weight ~ height + gender, data = df)
summary(model)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -97.50000 4.63547 -21.036 4.59e-08 ***
# height 1.00000 0.02667 37.500 3.14e-11 ***
# genderM -2.50000 1.47222 -1.698 0.141
See Also
- base::sd() — Standard deviation for regression diagnostics
- base::var() — Variance calculation
- How to Run a t-Test in R — Related statistical tests