rguides

nchar()

nchar(x, type = "chars", keepNA = FALSE)

The nchar() function returns the length of a character string. It can measure length in characters, bytes, or display width (useful for non-ASCII text). This function is essential for string validation, padding, and text processing workflows.

Syntax

nchar(x, type = "chars", keepNA = FALSE)

Parameters

ParameterTypeDefaultDescription
xcharacter,A character vector or object coercible to character
typecharacter”chars”Measurement type: “chars” (characters), “bytes” (bytes), “width” (display width)
keepNAlogicalFALSEIf TRUE, NA strings return NA instead of being treated as NA

Examples

Basic character counting

# Simple string length
nchar("hello")
# [1] 5

# Vector of strings
nchar(c("a", "ab", "abc"))
# [1] 1 2 3

When you pass a character vector to nchar(), the function returns an integer vector where each element corresponds to the character count of the corresponding input string. The default type = "chars" counts Unicode code points, which is appropriate for most text processing tasks in R. For strings that contain only ASCII characters, the character count and byte count are identical, so you can use the default without worrying about encoding details.

Counting different types

# Characters vs bytes for Unicode
text <- "héllo"
nchar(text, type = "chars")
# [1] 5

nchar(text, type = "bytes")
# [1] 6 (UTF-8 encoding uses 2 bytes for é)

# Display width (useful for formatting)
nchar("日本語", type = "width")
# [1] 6 (each CJK character has width 2)

The type parameter becomes important when your text contains multibyte characters. The "bytes" option counts the raw bytes in the string’s encoding — for UTF-8, accented characters like é consume 2 bytes while counting as a single character. The "width" option is specifically designed for terminal display formatting: CJK characters and other wide glyphs occupy two columns in a monospaced terminal, and type = "width" accounts for this when you need to align text in console output or fixed-width reports.

Handling NA values

# Default: NA becomes NA_integer_
nchar(c("text", NA, "more"))
# [1]  4 NA  4

# keepNA = TRUE: NA strings return NA
nchar(c("text", NA, "more"), keepNA = TRUE)
# [1]  4 NA  4

# Empty string returns 0
nchar("")
# [1] 0

The keepNA parameter controls how nchar() treats actual NA values in the input vector. When keepNA = FALSE (the default), nchar(NA) returns 2 because R treats NA as the literal string "NA" and counts its characters — a historical behavior that can produce misleading results in data pipelines. Setting keepNA = TRUE makes the function return NA for missing values, which is almost always the correct choice when you are computing lengths on a column that may contain missing observations. Empty strings consistently return 0 regardless of the keepNA setting.

Common patterns

String validation

# Filter strings of specific length
words <- c("cat", "elephant", "hi", "giraffe")
words[nchar(words) > 3]
# [1] "elephant" "giraffe"

# Check if string exceeds maximum length
truncate <- function(x, max_len) {
  ifelse(nchar(x) > max_len, paste0(substr(x, 1, max_len), "..."), x)
}
truncate("This is a long string", 10)
# [1] "This is a ..."

Filtering by string length is a common data validation pattern. The example above uses boolean indexing with nchar(words) > 3 to select only entries long enough to be meaningful — useful for removing single-character noise or enforcing minimum field lengths in form data. The truncate() helper wraps this pattern into a reusable function that clips strings exceeding a maximum width and appends an ellipsis, which is helpful for generating summary displays or abbreviating long labels in plots and tables.

Padding strings to uniform width

# Pad strings to equal length for display
pad_strings <- function(vec, width = 10) {
  sprintf("%-*s", width, vec)
}
pad_strings(c("a", "abc", "abcdef"))
# [1] "a         " "abc        " "abcdef    "

How nchar() behaves

nchar() returns an integer vector of the same length as the input. The type argument controls what is counted:

  • "chars", counts Unicode code points (the default and most useful for text processing)
  • "bytes", counts raw bytes in the encoding (may differ from character count for multibyte UTF-8)
  • "width", counts display width, which accounts for wide characters like CJK glyphs that occupy two columns in a terminal

For ASCII text, all three give the same result. For multibyte text, "chars" and "bytes" diverge: the UTF-8 encoding of "é" takes 2 bytes but is 1 character.

By default, nchar(NA) returns 2 (the length of the string "NA"). Set keepNA = TRUE to return NA for missing values instead, this is almost always what you want in data pipelines. The discrepancy is a historical quirk and keepNA = NA (the base default for logical NA) switches to keepNA = TRUE in recent R versions.

nchar() is faster than str_length() from stringr for ASCII-only data. For Unicode-heavy text, both behave identically.

In data validation, nchar() is commonly used to enforce length constraints on text fields — checking that phone numbers have exactly 10 digits, that postal codes fall within expected ranges, or that free-text fields are not empty (nchar(trimws(x)) > 0). Combining nchar() with which() lets you find which rows violate length constraints without loading any additional packages.

nchar() counts characters, not bytes. For ASCII text these are equivalent, but multi-byte UTF-8 characters count as a single character. For example, nchar("café") returns 4, not 5, even though the é character takes two bytes in UTF-8. To count bytes instead of characters, use nchar(x, type = "bytes"). This distinction matters when interfacing with systems that impose byte-length limits (database varchar columns, fixed-width binary formats) rather than character-length limits.

nchar(NA) returns NA by default, but nchar(NA, keepNA = FALSE) returns 2 (the character count of the string "NA"). Set keepNA = TRUE explicitly if your code depends on NA propagation and you want to make that dependency visible to the reader.

See also