How to encode a factor variable as numeric in R
Factors store categorical data with discrete levels. Sometimes you need the underlying numeric codes instead of the character labels.
The basics: as.numeric()
When you convert a factor directly to numeric, you get the internal integer codes:
# Create a factor
education <- factor(c("high school", "bachelors", "masters", "phd"))
education
# [1] high school bachelors masters phd
# Levels: bachelors high school masters phdd
# Convert to numeric - this gives you the codes!
as.numeric(education)
# [1] 2 1 3 4
This works, but the codes are alphabetical by default (1 = bachelors, 2 = high school).
Get numeric codes in your level order
If you defined a specific order, extract those codes properly:
# Define custom level order
education <- factor(
c("high school", "bachelors", "masters", "phd"),
levels = c("high school", "bachelors", "masters", "phd")
)
# Now the codes match your order
as.numeric(education)
# [1] 1 2 3 4
Extract numeric from factor levels
Need the level positions without the internal encoding? Use this pattern:
# Method 1: Use as.numeric() on the levels, then index
x <- factor(c("low", "medium", "high"))
as.numeric(x)
# [1] 2 3 1
# Method 2: Match against levels explicitly
x <- factor(c("low", "medium", "high"))
match(x, levels(x))
# [1] 2 3 1
Using dplyr and tidyr
In the tidyverse, create a numeric column with mutate():
library(dplyr)
df <- data.frame(
id = 1:5,
category = factor(c("low", "medium", "high", "low", "high"))
)
df <- df %>%
mutate(category_num = as.numeric(category))
df
# id category category_num
# 1 1 low 1
# 2 2 medium 2
# 3 3 high 3
# 4 4 low 1
# 5 5 high 3
Ordered factors
Ordered factors preserve the order but still encode as integers:
# Create ordered factor
satisfaction <- factor(
c("low", "medium", "high", "low"),
levels = c("low", "medium", "high"),
ordered = TRUE
)
# Numeric conversion respects the order
as.numeric(satisfaction)
# [1] 1 2 3 1
# You can also use as.integer()
as.integer(satisfaction)
# [1] 1 2 3 1
Preserve original numeric values from character
If your factor was originally numeric, you might want the original numbers:
# When factor was created from numbers
rating <- factor(c(1, 5, 3, 4, 2))
# Get the original numbers back
as.numeric(as.character(rating))
# [1] 1 5 3 4 2
This matters because as.numeric(factor(c(1, 5, 3))) gives you c(1, 3, 2) (the codes), not c(1, 5, 3) (the original values).
See Also
- factor - Full reference for the factor data type
- How to convert a column to a factor - The reverse operation
- tibble - The tidyverse tibble alternative to data.frame