rguides

toupper()

toupper(x)

toupper() converts character vectors to uppercase. It is a base R primitive, available in all R versions, and is useful for case normalization when comparing strings, cleaning user input, and standardizing text data for analysis or storage.

Syntax

toupper(x)

Parameters

ParameterTypeDefaultDescription
xcharacter,A character vector to convert

The function accepts a single required argument: the character vector you want to convert to uppercase. Since toupper() is a primitive function implemented in C, it processes each element of the input vector efficiently without requiring an R-level loop. Characters that have no lowercase form — digits, punctuation marks, and symbols — pass through the function unchanged and appear in the output exactly as they were in the input.

Examples

Basic usage

toupper("hello world")
# [1] "HELLO WORLD"

toupper("HeLLo WoRLd")
# [1] "HELLO WORLD"

The function converts every lowercase letter in the input to its uppercase equivalent, while non-alphabetic characters pass through unchanged. The conversion follows the current locale setting: in a UTF-8 locale, accented characters like é become É correctly, but in the C locale only the ASCII letters a–z are affected. This locale sensitivity means you should test with representative input when your data contains non-ASCII text, especially if the code will run across different platforms or in containerized environments where the locale may differ from your development machine.

Working with vectors

fruits <- c("Apple", "BANANA", "Cherry")
toupper(fruits)
# [1] "APPLE"  "BANANA" "CHERRY"

Because toupper() is vectorized, you can pass an entire character vector and receive a result of the same length where each element has been converted independently. This behavior integrates naturally with dplyr::mutate() and base R transform() when normalizing a data frame column. The function processes elements sequentially in compiled C code, so applying it to a column with thousands of rows is fast and requires no explicit iteration over the vector elements.

Practical applications

# Case-insensitive matching
names <- c("Alice", "BOB", "charlie")
target <- "ALICE"

toupper(names) == toupper(target)
# [1]  TRUE FALSE FALSE

Case-insensitive matching with toupper() is a reliable pattern when you need to compare strings that may arrive with inconsistent capitalization. By normalizing both sides to uppercase before testing equality, you avoid false negatives without relying on regex flags or hidden function options. This approach is explicit and auditable — anyone reading the code can see immediately that case is being handled, which is especially valuable in data pipelines where the source of capitalization differences may be several steps upstream from the comparison.

Common patterns

Case-insensitive table lookup

lookup_table <- c("yes" = 1, "no" = 2, "maybe" = 3)

user_response <- "YES"
lookup_table[toupper(user_response)]
# YES 
#   1

Named vector lookup with case normalization is a compact alternative to nested ifelse() chains for mapping a fixed set of known string values to corresponding codes or categories. The example normalizes the user’s response to uppercase before indexing into the lookup table, so "Yes", "yes", and "YES" all resolve to the same integer. The same technique works with tolower() when the lookup table keys are stored in lowercase — pick one direction and apply it uniformly to both the lookup keys and every input you index against them.

Data cleaning pipeline

raw_data <- data.frame(
  name = c("ALICE", "bob", "Charlie"),
  city = c("new york", "London", "PARIS")
)

raw_data$name <- toupper(raw_data$name)
raw_data$city <- toupper(raw_data$city)

# Result: all uppercase
#       name     city
# 1    ALICE  NEW YORK
# 2      bob    LONDON
# 3  CHARLIE    PARIS

How toupper() behaves

toupper() follows the current locale for case conversion, which matters for non-ASCII characters. In the C locale, only ASCII letters are converted — accented characters like é or ü are left unchanged. In a UTF-8 locale (the default on most systems), Unicode case mappings apply. If your data contains mixed locale strings and you need consistent behavior, test with a small sample first.

toupper() propagates NA values: toupper(NA) returns NA_character_. It does not warn or error on missing values. The function is vectorized and processes each element of the input vector independently.

For tidyverse workflows, stringr::str_to_upper() does the same thing with explicit locale control via the locale argument. Use toupper() in base R code and str_to_upper() when you need to specify a locale explicitly.

The most common real-world application is case normalization before joining or comparing datasets where the same value may appear in different capitalizations. Converting both sides to uppercase before comparing is idiomatic and avoids complex regex solutions. In database-style joins, this pattern prevents false mismatches on keys like country codes, status flags, or category labels that different data sources may have capitalized differently.

# Case normalization before joining datasets
customers <- data.frame(
  id   = c(1, 2, 3),
  code = c("us", "UK", "de"),
  stringsAsFactors = FALSE
)

orders <- data.frame(
  order_id = c(101, 102, 103),
  country  = c("US", "uk", "DE"),
  amount   = c(50, 75, 30),
  stringsAsFactors = FALSE
)

# Normalize both key columns before merging
customers$code <- toupper(customers$code)
orders$country  <- toupper(orders$country)
merge(customers, orders, by.x = "code", by.y = "country")
#   code id order_id amount
# 1   DE  3      103     30
# 2   UK  2      102     75
# 3   US  1      101     50

toupper() does not modify the original vector — it returns a new character vector. To update a column, assign the result back: df$col <- toupper(df$col). This is standard R copy-on-modify behavior: base R string functions return a new value rather than mutating the input. No special handling is needed for missing values since NA propagates through unchanged.

toupper() and tolower() are locale-sensitive: the uppercase of "i" depends on the current locale. In Turkish locale (LC_ALL=tr_TR), toupper("i") is "İ" (dotted capital I), not "I". For locale-independent ASCII-only transformation, use chartr("a-z", "A-Z", x) or set the locale explicitly before calling. In most data science workflows with ASCII text, this distinction does not arise, but it matters for internationalized applications or when processing user-supplied strings from unknown locales.

See also