How to Convert Columns to Factors in R
Converting columns to factors is essential when R needs to treat character data as categorical. Factors matter in three contexts: statistical models (determining dummy encoding and contrast coding), display order in plots and tables, and memory efficiency for large categorical columns with repeated values.
# Convert character to factor with custom level order
df <- data.frame(
gender = c("male", "female", "male", "female"),
education = c("high school", "bachelors", "masters", "phd")
)
df$gender <- factor(df$gender, levels = c("female", "male"))
df$education <- factor(df$education,
levels = c("high school", "bachelors", "masters", "phd"))
levels(df$education)
# [1] "high school" "bachelors" "masters" "phd"
# Numeric-to-factor with custom labels
df$rating <- factor(c(1, 5, 3, 4, 2),
levels = 1:5,
labels = c("poor", "fair", "good", "very good", "excellent"))
The levels argument controls display order; ordered = TRUE creates an ordered factor for ordinal data like satisfaction ratings (low < medium < high). The as.factor() shortcut works identically to factor() but accepts no extra arguments. In tidyverse pipelines, combine dplyr::mutate() with factor() to convert columns in place.
# Ordered factor and dplyr approach
df$satisfaction <- factor(
c("low", "high", "medium", "low", "high"),
levels = c("low", "medium", "high"),
ordered = TRUE
)
df$satisfaction[1] < df$satisfaction[2] # TRUE
library(dplyr)
df <- df %>% mutate(
gender = factor(gender, levels = c("female", "male")),
satisfaction = factor(satisfaction,
levels = c("low", "medium", "high"), ordered = TRUE)
)
For most data manipulation tasks, character columns work just as well as factors. Convert to factor when passing data to a model, when you need a specific level order for display, or when memory usage matters for high-cardinality columns. The forcats package provides factor-specific helpers like fct_reorder(), fct_lump(), and fct_infreq() for reordering and collapsing levels.
See also
- factor, Full reference for the factor data type
- data.frame, How data frames work in R
- tibble, The tidyverse tibble alternative