ggplot2::geom_boxplot
geom_boxplot draws a box and whiskers plot in the style of Tukey. It compactly displays the distribution of a continuous variable across groups, showing five summary statistics: the median, two hinges (quartiles), and two whiskers, with outlying points plotted individually.
Signature
geom_boxplot(
mapping = NULL,
data = NULL,
stat = "boxplot",
position = "dodge2",
...,
outliers = TRUE,
outlier.colour = NULL,
outlier.color = NULL,
outlier.fill = NULL,
outlier.shape = NULL,
outlier.size = NULL,
outlier.stroke = 0.5,
outlier.alpha = NULL,
whisker.colour = NULL,
whisker.color = NULL,
whisker.linetype = NULL,
whisker.linewidth = NULL,
staple.colour = NULL,
staple.color = NULL,
staple.linetype = NULL,
staple.linewidth = NULL,
median.colour = NULL,
median.color = NULL,
median.linetype = NULL,
median.linewidth = NULL,
box.colour = NULL,
box.color = NULL,
box.linetype = NULL,
box.linewidth = NULL,
notch = FALSE,
notchwidth = 0.5,
staplewidth = 0,
varwidth = FALSE,
na.rm = FALSE,
orientation = NA,
show.legend = NA,
inherit.aes = TRUE
)
Returns: A Layer object.
Required and Commonly Used Aesthetics
Box plots require at least one positional aesthetic. Typically x (categorical) and y (continuous):
ggplot(data, aes(x = group, y = value)) +
geom_boxplot()
Required aesthetics for geom_boxplot:
x— categorical variable (group)y— continuous variable (distribution)
Optional aesthetics include fill, colour, group, and weight.
Parameters
Outlier Styling
| Parameter | Description |
|---|---|
outliers | Whether to display (TRUE) or discard (FALSE) outliers. Use outlier.shape = NA to hide outliers while keeping axis limits at full data range. |
outlier.colour / outlier.color | Colour of outlier points. NULL inherits from data aesthetics. |
outlier.fill | Fill colour for outlier points. |
outlier.shape | Shape of outlier points (see ?pch for values). |
outlier.size | Size of outlier points. |
outlier.stroke | Stroke width of outlier points. Default is 0.5. |
outlier.alpha | Alpha (transparency) of outlier points. |
Whisker and Box Styling
| Parameter | Description |
|---|---|
whisker.colour / whisker.color | Colour of whisker lines. |
whisker.linetype | Line type for whiskers. |
whisker.linewidth | Line width for whiskers. |
box.colour / box.color | Colour of the box. |
box.linetype | Line type for the box outline. |
box.linewidth | Line width for the box outline. |
median.colour / median.color | Colour of the median line. |
median.linetype | Line type for the median line. |
median.linewidth | Line width for the median line. |
staple.colour / staple.color | Colour of staple lines (ends of whiskers). Staples only appear when staplewidth is non-zero. |
staple.linetype | Line type for staples. |
staple.linewidth | Line width for staples. |
Plot Structure Parameters
| Parameter | Description |
|---|---|
notch | FALSE (default) for standard box plot. TRUE for notched box plot — notches that don’t overlap suggest significantly different medians. |
notchwidth | Width of the notch relative to the box body. Defaults to 0.5 (half the box width). |
varwidth | FALSE (default). TRUE draws boxes with widths proportional to the square root of the number of observations in each group. |
staplewidth | Relative width of staples to the box. Default 0 means no staples drawn. Set to a positive value to draw staple lines at whisker ends. |
na.rm | FALSE (default) — missing values produce a warning. TRUE silently removes NAs. |
orientation | Layer orientation. NA (default) auto-determines from aesthetic mapping. Explicitly set to "x" or "y" if auto-detection fails. |
inherit.aes | TRUE (default) — combines this layer’s mapping with the plot’s default mapping. FALSE to use only this layer’s mapping. |
Basic Example
library(ggplot2)
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot()
This draws one box per cylinder group. The box spans the interquartile range (IQR), the thick line inside is the median, the whiskers extend to 1.5 * IQR beyond the hinges, and points beyond that are plotted as individual outliers.
Horizontal Box Plot
ggplot(mtcars, aes(x = mpg, y = factor(cyl))) +
geom_boxplot()
Set orientation = "y" explicitly if needed, though ggplot2 usually detects the orientation correctly.
Styling the Box
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot(
fill = "steelblue",
colour = "#333333",
box.linewidth = 1,
median.colour = "red",
median.linewidth = 1.5
)
Showing Raw Data Points
Overlay jittered points to show the actual data:
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot(outliers = FALSE) +
geom_jitter(width = 0.2, alpha = 0.4)
Setting outliers = FALSE on the box plot and adding geom_jitter() on top is a cleaner look than showing both the box plot outliers and the jittered points.
Notched Box Plot
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot(notch = TRUE, notchwidth = 0.6)
Notches narrow toward the median. The notch convention is roughly median ± 1.58 * IQR / sqrt(n). If two boxes have non-overlapping notches, their medians are likely significantly different.
Variable Width Boxes
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot(varwidth = TRUE)
Boxes are drawn with widths proportional to sqrt(n) for each group — more observations means a wider box. Use this when group sizes vary considerably.
Whiskers Beyond the Default Range
The stat_boxplot stat controls how far whiskers extend:
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
stat_boxplot(geom = "errorbar", coef = 3) + # extends to 3 * IQR
geom_boxplot()
The coef argument controls how far the whiskers extend from the hinge (default is 1.5). Increase it for fewer outliers, decrease for more.
Weighted Box Plot
library(dplyr)
mtcars %>%
mutate(weight = ifelse(cyl == 6, 2, 1)) %>%
ggplot(aes(x = factor(cyl), y = mpg, weight = weight)) +
geom_boxplot(varwidth = TRUE)
Use the weight aesthetic to produce weighted box plots. The varwidth = TRUE option respects weights when scaling box widths.
Grouped Box Plots with fill
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(gear))) +
geom_boxplot(position = "dodge2")
Use fill to create grouped box plots. position = "dodge2" places the boxes side by side. dodge2 is the default position for geom_boxplot and handles variable widths better than position = "dodge".
Quick Side-by-Side Comparison
ggplot(mtcars, aes(x = paste(cyl, "cyl"), y = mpg)) +
geom_boxplot(width = 0.4)
A quick trick for side-by-side boxes without a grouping variable: paste() the category values together in the x aesthetic.
See Also
- /reference/tidyverse/ggplot2_geom_histogram/ — the histogram geom, for continuous variable distributions
- /reference/tidyverse/ggplot2_geom_bar/ — bar charts for categorical counts
- /reference/tidyverse/ggplot2_aes/ — aesthetic mappings that work with all geoms