rguides

dplyr::case_when

case_when(...)

case_when() is dplyr’s vectorized conditional, it evaluates a series of conditions in order and returns the value from the first matching branch. It’s the tidy equivalent of SQL’s CASE WHEN, and it makes multi-branch logic in mutate() much cleaner than stacking ifelse() calls.

Basic syntax

Each branch uses the formula syntax condition ~ value:

library(dplyr)

df <- tibble(x = 1:15)

df |>
  mutate(
    label = case_when(
      x %% 35 == 0 ~ "fizz buzz",
      x %% 5 == 0 ~ "fizz",
      x %% 7 == 0 ~ "buzz",
      TRUE ~ as.character(x)
    )
  )

TRUE is the catch-all fallback, without it, unmatched rows return NA.

How it works

case_when() evaluates all conditions from top to bottom, stops at the first match for each row, and returns the corresponding value. It is lazy in the right direction: rows that match the first condition never evaluate the later conditions. Each condition is tested against every element in the input vector, and once a row finds a true branch, none of its remaining conditions are checked for that row.

Order matters because case_when() stops at the first match. Put the most specific conditions first and general ones last. The following example demonstrates correct ordering, where the most restrictive divisibility check appears before the broader one:

# This works — specific condition first
case_when(
  x %% 15 == 0 ~ "divisible by 15",
  x %% 5 == 0 ~ "divisible by 5",
  TRUE ~ "neither"
)

If you reverse the order and put the catch-all TRUE first, the remaining branches become unreachable because every row matches that initial catch-all condition immediately. This is a common mistake when refactoring conditionals — what looks like a harmless reordering actually short-circuits all subsequent logic. The following example is deliberately wrong to illustrate the failure mode:

# This doesn't — every row matches the first condition
case_when(
  TRUE ~ "neither",
  x %% 5 == 0 ~ "divisible by 5"  # never reached
)

Unmatched rows return NA

When no branch matches a given row, case_when() assigns NA to that position in the output vector. This is different from the default behavior in some other languages where unmatched cases produce errors or fall through to a default. The TRUE catch-all branch is the standard R idiom for setting a fallback value for rows that do not meet any of the explicit conditions:

case_when(
  x %% 5 == 0 ~ "divisible by 5",
  x %% 3 == 0 ~ "divisible by 3",
  TRUE ~ "neither"
)

Without TRUE, rows matching neither condition get NA.

NA handling

Missing values in the source data need special treatment because NA in a logical condition evaluates to NA (not FALSE), meaning rows with missing values will not match any branch including the TRUE catch-all. You must check for NA explicitly with is.na() before any other conditions to route those rows to a designated output value:

x <- c(1, NA, 3, NA, 5)

case_when(
  x %% 5 == 0 ~ "divisible by 5",
  is.na(x) ~ "missing",
  TRUE ~ "neither"
)
#> "neither", "missing", "neither", "missing", "divisible by 5"

Type consistency

All right-hand side values within a single case_when() call must share the same type. This means you need to use typed NA values: plain NA is logical, and using it in a character or numeric context throws a type mismatch error. The three typed NA variants are NA_real_ for doubles, NA_integer_ for integers, and NA_character_ for strings. The first example shows the error you get with bare NA, followed by the correct approaches:

# Wrong — NA is logical, but RHS is numeric
case_when(
  x %% 5 == 0 ~ 5,
  TRUE ~ NA   # error
)

# Correct — use NA_real_
case_when(
  x %% 5 == 0 ~ 5,
  TRUE ~ NA_real_
)

# For characters
case_when(
  x %% 5 == 0 ~ "divisible",
  TRUE ~ NA_character_
)

Vectorized operation

case_when() is fully vectorized, processing every row of the input simultaneously rather than iterating row by row. Each condition and each right-hand side expression is computed across all rows at once before the matching values are selected per row. This design makes case_when() fast on large data frames, but it also means side effects or expensive computations in any branch are executed for the entire column regardless of which rows match:

df <- tibble(a = c(1, -1, 2, -2), b = c(4, 4, 4, 4))

df |>
  mutate(
    result = case_when(
      a > 0 & b > 0 ~ sqrt(a * b),
      TRUE ~ as.numeric(a)
    )
  )
#>    a     b result
#> <dbl> <dbl>  <dbl>
#> 1     1     4   2
#> 2    -1     4  -1
#> 3     2     4   2.83
#> 4    -2     4  -2

Because all RHS expressions are evaluated for all rows, you can get NaN from operations like sqrt() applied to negative numbers even when only some rows match. The following example makes this visible — note that sqrt(-2) and sqrt(-1) produce NaN even though those rows are supposed to fall through to the TRUE branch:

y <- c(-2, -1, 0, 1, 2)
case_when(
  y >= 0 ~ sqrt(y),   # sqrt(-2) and sqrt(-1) evaluated too — produce NaN
  TRUE ~ y
)
#> NaN, NaN, 0, 1, 1.414

Use is.nan() or filter rows first if this matters. This side-effect of eager evaluation is worth keeping in mind when your right-hand side expressions involve potentially undefined operations — case_when() is not lazy about computing values, it is only lazy about selecting which one to return.

Reusable helper functions

case_when() is not a tidy eval function, so it does not capture tidyverse pronouns like .data or {{. This means you can extract a case_when() block into a plain R function and call it from mutate() without any special tidy evaluation handling. The function receives ordinary column values and returns a vector that mutate() assigns to a new column:

classify_size <- function(height, mass) {
  case_when(
    height > 200 | mass > 200 ~ "large",
    height > 150 | mass > 100 ~ "medium",
    TRUE ~ "small"
  )
}

starwars |>
  mutate(size = classify_size(height, mass)) |>
  select(name, height, mass, size)

case_when vs ifelse

ifelse() handles exactly two cases, one true branch, one false. case_when() handles any number of branches with arbitrary conditions:

case_when()ifelse()
BranchesUnlimitedTwo only
ConditionsArbitrary expressionsSingle condition
NAs in dataMust handle explicitlyPasses through
Recommended for mutateYesOkay for simple cases

See also