How to Encode Factor Variable as Numeric in R
When you encode factor variable as numeric in R, there’s a well-known gotcha: as.numeric(factor) returns the internal integer codes (alphabetical by default), not any numbers the factor labels happen to represent. When you need the codes — for modeling or plotting — this is exactly what you want.
education <- factor(c("high school", "bachelors", "masters", "phd"))
as.numeric(education)
# [1] 2 1 3 4
The codes reflect alphabetical ordering of levels unless you specified them explicitly. Pass a custom levels argument to factor() to control the numbering: factor(x, levels = c("high school", "bachelors", "masters", "phd")) assigns 1 to high school, 2 to bachelors, and so on.
The far more common mistake is converting a factor that started as numbers and expecting the original values back. as.numeric(factor(c(1, 5, 3))) gives c(1, 3, 2) — the internal codes — not c(1, 5, 3). To recover the original numbers, go through character first:
rating <- factor(c(1, 5, 3, 4, 2))
as.numeric(as.character(rating))
# [1] 1 5 3 4 2
This two-step pattern is worth memorizing if you work with survey data or imported CSVs where numeric columns get auto-converted to factors. The tidyverse equivalent inside mutate() works the same way: mutate(rating_num = as.numeric(as.character(rating))). For ordered factors, as.numeric() automatically respects the order you defined when creating the factor.
See also
- factor — Full reference for the factor data type
- How to convert a column to a factor — The reverse operation
- tibble — The tidyverse tibble alternative to data.frame