How to fit a linear model with lm() in R

· 4 min read · Updated March 15, 2026 · beginner
r statistics regression linear-model

The lm() function is R’s workhorse for fitting linear models. Whether you’re running simple linear regression or multiple regression, lm() provides a consistent interface for modeling relationships between variables.

Basic Usage

Simple Linear Regression

Fit a model predicting one variable from another:

# Create sample data
df <- data.frame(
  height = c(150, 160, 170, 180, 190, 155, 165, 175, 185, 195),
  weight = c(55, 65, 75, 85, 95, 58, 68, 78, 88, 98)
)

# Fit simple linear regression
model <- lm(weight ~ height, data = df)

# View model summary
summary(model)
# Call:
# lm(formula = weight ~ height, data = df)
# 
# Coefficients:
# (Intercept)      height  
#     -95.000       1.000  

# [1] -95
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept) -95.0000     4.0825  -23.27 1.25e-09 ***
# height        1.0000     0.0233   42.91 1.05e-14 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 1.291 on 8 degrees of freedom
# Multiple R-squared:  0.9958,	Adjusted R-squared:  0.9948
# F-statistic:  1841 on 1 and 8 DF,  p-value: 1.053e-14

The coefficient interpretation: for every 1-unit increase in height, weight increases by 1 unit.

Making Predictions

Use the model to predict new values:

# Predict for new heights
new_data <- data.frame(height = c(172, 180, 188))
predict(model, newdata = new_data)
#        1        2        3 
#  77.000  85.000  93.000

Multiple Linear Regression

Include multiple predictors:

# More complex dataset
df <- data.frame(
  weight = c(55, 65, 75, 85, 95, 58, 68, 78, 88, 98),
  height = c(150, 160, 170, 180, 190, 155, 165, 175, 185, 195),
  age = c(25, 30, 35, 40, 45, 22, 28, 32, 38, 42)
)

# Fit with multiple predictors
model <- lm(weight ~ height + age, data = df)
summary(model)

Interaction Terms

Model interactions between variables:

# Include interaction (height * age)
model_interaction <- lm(weight ~ height * age, data = df)
summary(model_interaction)

This tests whether the effect of height on weight depends on age.

Model Objects

Extracting Model Information

model <- lm(weight ~ height, data = df)

# Coefficients
coef(model)
# (Intercept)     height 
#         -95         1 

# Residuals
residuals(model)
#           1           2           3           4           5           6           7 
#  0.000e+00  0.000e+00  0.000e+00  0.000e+00  0.000e+00 -2.220e-16  0.000e+00 
#          8           9          10 
#  0.000e+00  0.000e+00  0.000e+00

# Fitted values
fitted(model)
#        1        2        3        4        5        6        7        8        9       10 
#  55.000   65.000   75.000   85.000   95.000   60.000   70.000   80.000   90.000  100.000

# R-squared
summary(model)$r.squared
# [1] 0.9957576

# Adjusted R-squared
summary(model)$adj.r.squared
# [1] 0.9947774

Model Formula Syntax

FormulaMeaning
y ~ xy predicted by x
y ~ x + zMultiple regression
y ~ x * zMain effects + interaction
y ~ .All available predictors
y ~ x - 1No intercept (through origin)
y ~ poly(x, 2)Polynomial (quadratic)

Model Diagnostics

Visual Diagnostics

Four key plots:

model <- lm(weight ~ height, data = df)

# Standard diagnostic plots
par(mfrow = c(2, 2))
plot(model)
  1. Residuals vs Fitted — Check for non-linearity
  2. Normal Q-Q — Check normality of residuals
  3. Scale-Location — Check homoscedasticity
  4. Residuals vs Leverage — Identify influential points

Statistical Tests

# Test for normality of residuals
shapiro.test(residuals(model))
# 	Shapiro-Wilk normality test
# 
# data:  residuals(model)
# W = 0.98585, p-value = 0.9783

# Breusch-Pagan test for heteroscedasticity
library(lmtest)
bptest(model)
# 	studentized Breusch-Pagan test
# 
# data:  model
# BP = 0.012521, df = 1, p-value = 0.9109

Working with Categorical Variables

R automatically creates dummy variables:

df <- data.frame(
  weight = c(55, 65, 75, 85, 95, 58, 68, 78, 88, 98),
  height = c(150, 160, 170, 180, 190, 155, 165, 175, 185, 195),
  gender = c("M", "M", "M", "M", "M", "F", "F", "F", "F", "F")
)

model <- lm(weight ~ height + gender, data = df)
summary(model)
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept) -97.50000    4.63547 -21.036 4.59e-08 ***
# height        1.00000    0.02667  37.500 3.14e-11 ***
# genderM      -2.50000    1.47222  -1.698    0.141   

See Also