rguides

ggplot2::geom_boxplot

geom_boxplot draws a box and whiskers plot in the style of Tukey. It compactly displays the distribution of a continuous variable across groups, showing five summary statistics: the median, two hinges (quartiles), and two whiskers, with outlying points plotted individually.

Signature

geom_boxplot(
  mapping = NULL,
  data = NULL,
  stat = "boxplot",
  position = "dodge2",
  ...,
  outliers = TRUE,
  outlier.colour = NULL,
  outlier.color = NULL,
  outlier.fill = NULL,
  outlier.shape = NULL,
  outlier.size = NULL,
  outlier.stroke = 0.5,
  outlier.alpha = NULL,
  whisker.colour = NULL,
  whisker.color = NULL,
  whisker.linetype = NULL,
  whisker.linewidth = NULL,
  staple.colour = NULL,
  staple.color = NULL,
  staple.linetype = NULL,
  staple.linewidth = NULL,
  median.colour = NULL,
  median.color = NULL,
  median.linetype = NULL,
  median.linewidth = NULL,
  box.colour = NULL,
  box.color = NULL,
  box.linetype = NULL,
  box.linewidth = NULL,
  notch = FALSE,
  notchwidth = 0.5,
  staplewidth = 0,
  varwidth = FALSE,
  na.rm = FALSE,
  orientation = NA,
  show.legend = NA,
  inherit.aes = TRUE
)

Returns: A Layer object.

Required and Commonly Used Aesthetics

Box plots require at least one positional aesthetic. Typically x (categorical) and y (continuous):

ggplot(data, aes(x = group, y = value)) +
  geom_boxplot()

Required aesthetics for geom_boxplot:

  • x — categorical variable (group)
  • y — continuous variable (distribution)

Optional aesthetics include fill, colour, group, and weight.

Parameters

Outlier Styling

ParameterDescription
outliersWhether to display (TRUE) or discard (FALSE) outliers. Use outlier.shape = NA to hide outliers while keeping axis limits at full data range.
outlier.colour / outlier.colorColour of outlier points. NULL inherits from data aesthetics.
outlier.fillFill colour for outlier points.
outlier.shapeShape of outlier points (see ?pch for values).
outlier.sizeSize of outlier points.
outlier.strokeStroke width of outlier points. Default is 0.5.
outlier.alphaAlpha (transparency) of outlier points.

Whisker and Box Styling

ParameterDescription
whisker.colour / whisker.colorColour of whisker lines.
whisker.linetypeLine type for whiskers.
whisker.linewidthLine width for whiskers.
box.colour / box.colorColour of the box.
box.linetypeLine type for the box outline.
box.linewidthLine width for the box outline.
median.colour / median.colorColour of the median line.
median.linetypeLine type for the median line.
median.linewidthLine width for the median line.
staple.colour / staple.colorColour of staple lines (ends of whiskers). Staples only appear when staplewidth is non-zero.
staple.linetypeLine type for staples.
staple.linewidthLine width for staples.

Plot Structure Parameters

ParameterDescription
notchFALSE (default) for standard box plot. TRUE for notched box plot — notches that don’t overlap suggest significantly different medians.
notchwidthWidth of the notch relative to the box body. Defaults to 0.5 (half the box width).
varwidthFALSE (default). TRUE draws boxes with widths proportional to the square root of the number of observations in each group.
staplewidthRelative width of staples to the box. Default 0 means no staples drawn. Set to a positive value to draw staple lines at whisker ends.
na.rmFALSE (default) — missing values produce a warning. TRUE silently removes NAs.
orientationLayer orientation. NA (default) auto-determines from aesthetic mapping. Explicitly set to "x" or "y" if auto-detection fails.
inherit.aesTRUE (default) — combines this layer’s mapping with the plot’s default mapping. FALSE to use only this layer’s mapping.

Basic Example

library(ggplot2)

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot()

This draws one box per cylinder group. The box spans the interquartile range (IQR), the thick line inside is the median, the whiskers extend to 1.5 * IQR beyond the hinges, and points beyond that are plotted as individual outliers.

Horizontal Box Plot

ggplot(mtcars, aes(x = mpg, y = factor(cyl))) +
  geom_boxplot()

Set orientation = "y" explicitly if needed, though ggplot2 usually detects the orientation correctly.

Styling the Box

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(
    fill = "steelblue",
    colour = "#333333",
    box.linewidth = 1,
    median.colour = "red",
    median.linewidth = 1.5
  )

Showing Raw Data Points

Overlay jittered points to show the actual data:

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(outliers = FALSE) +
  geom_jitter(width = 0.2, alpha = 0.4)

Setting outliers = FALSE on the box plot and adding geom_jitter() on top is a cleaner look than showing both the box plot outliers and the jittered points.

Notched Box Plot

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(notch = TRUE, notchwidth = 0.6)

Notches narrow toward the median. The notch convention is roughly median ± 1.58 * IQR / sqrt(n). If two boxes have non-overlapping notches, their medians are likely significantly different.

Variable Width Boxes

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(varwidth = TRUE)

Boxes are drawn with widths proportional to sqrt(n) for each group — more observations means a wider box. Use this when group sizes vary considerably.

Whiskers Beyond the Default Range

The stat_boxplot stat controls how far whiskers extend:

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  stat_boxplot(geom = "errorbar", coef = 3) +  # extends to 3 * IQR
  geom_boxplot()

The coef argument controls how far the whiskers extend from the hinge (default is 1.5). Increase it for fewer outliers, decrease for more.

Weighted Box Plot

library(dplyr)

mtcars %>%
  mutate(weight = ifelse(cyl == 6, 2, 1)) %>%
  ggplot(aes(x = factor(cyl), y = mpg, weight = weight)) +
  geom_boxplot(varwidth = TRUE)

Use the weight aesthetic to produce weighted box plots. The varwidth = TRUE option respects weights when scaling box widths.

Grouped Box Plots with fill

ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(gear))) +
  geom_boxplot(position = "dodge2")

Use fill to create grouped box plots. position = "dodge2" places the boxes side by side. dodge2 is the default position for geom_boxplot and handles variable widths better than position = "dodge".

Quick Side-by-Side Comparison

ggplot(mtcars, aes(x = paste(cyl, "cyl"), y = mpg)) +
  geom_boxplot(width = 0.4)

A quick trick for side-by-side boxes without a grouping variable: paste() the category values together in the x aesthetic.

See Also