toupper()
toupper(x) toupper() converts character vectors to uppercase. It is a base R primitive, available in all R versions, and is useful for case normalization when comparing strings, cleaning user input, and standardizing text data for analysis or storage.
Syntax
toupper(x)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
x | character | , | A character vector to convert |
The function accepts a single required argument: the character vector you want to convert to uppercase. Since toupper() is a primitive function implemented in C, it processes each element of the input vector efficiently without requiring an R-level loop. Characters that have no lowercase form — digits, punctuation marks, and symbols — pass through the function unchanged and appear in the output exactly as they were in the input.
Examples
Basic usage
toupper("hello world")
# [1] "HELLO WORLD"
toupper("HeLLo WoRLd")
# [1] "HELLO WORLD"
The function converts every lowercase letter in the input to its uppercase equivalent, while non-alphabetic characters pass through unchanged. The conversion follows the current locale setting: in a UTF-8 locale, accented characters like é become É correctly, but in the C locale only the ASCII letters a–z are affected. This locale sensitivity means you should test with representative input when your data contains non-ASCII text, especially if the code will run across different platforms or in containerized environments where the locale may differ from your development machine.
Working with vectors
fruits <- c("Apple", "BANANA", "Cherry")
toupper(fruits)
# [1] "APPLE" "BANANA" "CHERRY"
Because toupper() is vectorized, you can pass an entire character vector and receive a result of the same length where each element has been converted independently. This behavior integrates naturally with dplyr::mutate() and base R transform() when normalizing a data frame column. The function processes elements sequentially in compiled C code, so applying it to a column with thousands of rows is fast and requires no explicit iteration over the vector elements.
Practical applications
# Case-insensitive matching
names <- c("Alice", "BOB", "charlie")
target <- "ALICE"
toupper(names) == toupper(target)
# [1] TRUE FALSE FALSE
Case-insensitive matching with toupper() is a reliable pattern when you need to compare strings that may arrive with inconsistent capitalization. By normalizing both sides to uppercase before testing equality, you avoid false negatives without relying on regex flags or hidden function options. This approach is explicit and auditable — anyone reading the code can see immediately that case is being handled, which is especially valuable in data pipelines where the source of capitalization differences may be several steps upstream from the comparison.
Common patterns
Case-insensitive table lookup
lookup_table <- c("yes" = 1, "no" = 2, "maybe" = 3)
user_response <- "YES"
lookup_table[toupper(user_response)]
# YES
# 1
Named vector lookup with case normalization is a compact alternative to nested ifelse() chains for mapping a fixed set of known string values to corresponding codes or categories. The example normalizes the user’s response to uppercase before indexing into the lookup table, so "Yes", "yes", and "YES" all resolve to the same integer. The same technique works with tolower() when the lookup table keys are stored in lowercase — pick one direction and apply it uniformly to both the lookup keys and every input you index against them.
Data cleaning pipeline
raw_data <- data.frame(
name = c("ALICE", "bob", "Charlie"),
city = c("new york", "London", "PARIS")
)
raw_data$name <- toupper(raw_data$name)
raw_data$city <- toupper(raw_data$city)
# Result: all uppercase
# name city
# 1 ALICE NEW YORK
# 2 bob LONDON
# 3 CHARLIE PARIS
How toupper() behaves
toupper() follows the current locale for case conversion, which matters for non-ASCII characters. In the C locale, only ASCII letters are converted — accented characters like é or ü are left unchanged. In a UTF-8 locale (the default on most systems), Unicode case mappings apply. If your data contains mixed locale strings and you need consistent behavior, test with a small sample first.
toupper() propagates NA values: toupper(NA) returns NA_character_. It does not warn or error on missing values. The function is vectorized and processes each element of the input vector independently.
For tidyverse workflows, stringr::str_to_upper() does the same thing with explicit locale control via the locale argument. Use toupper() in base R code and str_to_upper() when you need to specify a locale explicitly.
The most common real-world application is case normalization before joining or comparing datasets where the same value may appear in different capitalizations. Converting both sides to uppercase before comparing is idiomatic and avoids complex regex solutions. In database-style joins, this pattern prevents false mismatches on keys like country codes, status flags, or category labels that different data sources may have capitalized differently.
# Case normalization before joining datasets
customers <- data.frame(
id = c(1, 2, 3),
code = c("us", "UK", "de"),
stringsAsFactors = FALSE
)
orders <- data.frame(
order_id = c(101, 102, 103),
country = c("US", "uk", "DE"),
amount = c(50, 75, 30),
stringsAsFactors = FALSE
)
# Normalize both key columns before merging
customers$code <- toupper(customers$code)
orders$country <- toupper(orders$country)
merge(customers, orders, by.x = "code", by.y = "country")
# code id order_id amount
# 1 DE 3 103 30
# 2 UK 2 102 75
# 3 US 1 101 50
toupper() does not modify the original vector — it returns a new character vector. To update a column, assign the result back: df$col <- toupper(df$col). This is standard R copy-on-modify behavior: base R string functions return a new value rather than mutating the input. No special handling is needed for missing values since NA propagates through unchanged.
toupper() and tolower() are locale-sensitive: the uppercase of "i" depends on the current locale. In Turkish locale (LC_ALL=tr_TR), toupper("i") is "İ" (dotted capital I), not "I". For locale-independent ASCII-only transformation, use chartr("a-z", "A-Z", x) or set the locale explicitly before calling. In most data science workflows with ASCII text, this distinction does not arise, but it matters for internationalized applications or when processing user-supplied strings from unknown locales.