rguides

How to Encode Factor Variable as Numeric in R

When you encode factor variable as numeric in R, there’s a well-known gotcha: as.numeric(factor) returns the internal integer codes (alphabetical by default), not any numbers the factor labels happen to represent. When you need the codes — for modeling or plotting — this is exactly what you want.

education <- factor(c("high school", "bachelors", "masters", "phd"))
as.numeric(education)
# [1] 2 1 3 4

The codes reflect alphabetical ordering of levels unless you specified them explicitly. Pass a custom levels argument to factor() to control the numbering: factor(x, levels = c("high school", "bachelors", "masters", "phd")) assigns 1 to high school, 2 to bachelors, and so on.

The far more common mistake is converting a factor that started as numbers and expecting the original values back. as.numeric(factor(c(1, 5, 3))) gives c(1, 3, 2) — the internal codes — not c(1, 5, 3). To recover the original numbers, go through character first:

rating <- factor(c(1, 5, 3, 4, 2))
as.numeric(as.character(rating))
# [1] 1 5 3 4 2

This two-step pattern is worth memorizing if you work with survey data or imported CSVs where numeric columns get auto-converted to factors. The tidyverse equivalent inside mutate() works the same way: mutate(rating_num = as.numeric(as.character(rating))). For ordered factors, as.numeric() automatically respects the order you defined when creating the factor.

See also