rguides

How to Convert Columns to Factors in R

Converting columns to factors is essential when R needs to treat character data as categorical. Factors matter in three contexts: statistical models (determining dummy encoding and contrast coding), display order in plots and tables, and memory efficiency for large categorical columns with repeated values.

# Convert character to factor with custom level order
df <- data.frame(
  gender    = c("male", "female", "male", "female"),
  education = c("high school", "bachelors", "masters", "phd")
)

df$gender <- factor(df$gender, levels = c("female", "male"))
df$education <- factor(df$education,
  levels = c("high school", "bachelors", "masters", "phd"))

levels(df$education)
# [1] "high school" "bachelors"  "masters"    "phd"

# Numeric-to-factor with custom labels
df$rating <- factor(c(1, 5, 3, 4, 2),
  levels = 1:5,
  labels = c("poor", "fair", "good", "very good", "excellent"))

The levels argument controls display order; ordered = TRUE creates an ordered factor for ordinal data like satisfaction ratings (low < medium < high). The as.factor() shortcut works identically to factor() but accepts no extra arguments. In tidyverse pipelines, combine dplyr::mutate() with factor() to convert columns in place.

# Ordered factor and dplyr approach
df$satisfaction <- factor(
  c("low", "high", "medium", "low", "high"),
  levels = c("low", "medium", "high"),
  ordered = TRUE
)
df$satisfaction[1] < df$satisfaction[2]  # TRUE

library(dplyr)
df <- df %>% mutate(
  gender       = factor(gender, levels = c("female", "male")),
  satisfaction = factor(satisfaction,
    levels = c("low", "medium", "high"), ordered = TRUE)
)

For most data manipulation tasks, character columns work just as well as factors. Convert to factor when passing data to a model, when you need a specific level order for display, or when memory usage matters for high-cardinality columns. The forcats package provides factor-specific helpers like fct_reorder(), fct_lump(), and fct_infreq() for reordering and collapsing levels.

See also

  • factor, Full reference for the factor data type
  • data.frame, How data frames work in R
  • tibble, The tidyverse tibble alternative