rguides

ggplot2::geom_boxplot

geom_boxplot draws a box and whiskers plot in the style of Tukey. It compactly displays the distribution of a continuous variable across groups, showing five summary statistics: the median, two hinges (quartiles), and two whiskers, with outlying points plotted individually.

Signature

geom_boxplot(
  mapping = NULL,
  data = NULL,
  stat = "boxplot",
  position = "dodge2",
  ...,
  outliers = TRUE,
  outlier.colour = NULL,
  outlier.color = NULL,
  outlier.fill = NULL,
  outlier.shape = NULL,
  outlier.size = NULL,
  outlier.stroke = 0.5,
  outlier.alpha = NULL,
  whisker.colour = NULL,
  whisker.color = NULL,
  whisker.linetype = NULL,
  whisker.linewidth = NULL,
  staple.colour = NULL,
  staple.color = NULL,
  staple.linetype = NULL,
  staple.linewidth = NULL,
  median.colour = NULL,
  median.color = NULL,
  median.linetype = NULL,
  median.linewidth = NULL,
  box.colour = NULL,
  box.color = NULL,
  box.linetype = NULL,
  box.linewidth = NULL,
  notch = FALSE,
  notchwidth = 0.5,
  staplewidth = 0,
  varwidth = FALSE,
  na.rm = FALSE,
  orientation = NA,
  show.legend = NA,
  inherit.aes = TRUE
)

Returns: A Layer object. You add this layer to a ggplot with the + operator, and the box plot automatically inherits the aesthetic mappings set in the parent ggplot() call unless you override them with the layer’s own mapping argument.

Required and commonly used aesthetics

Box plots require at least one positional aesthetic. Typically x (categorical) and y (continuous):

ggplot(data, aes(x = group, y = value)) +
  geom_boxplot()

This minimal call maps a categorical grouping variable to the x-axis and a numeric measurement to the y-axis, producing one box per group. The box boundaries mark the first and third quartiles, while the bold horizontal line inside each box identifies the median value. Whiskers extend to the most extreme observation within 1.5 times the interquartile range, and any points that fall outside that range appear as individual dots. Understanding these positional aesthetics is essential before you customise the appearance or behaviour of the box plot through the parameters that follow.

Required aesthetics for geom_boxplot:

  • x, categorical variable (group)
  • y, continuous variable (distribution)

Optional aesthetics include fill, colour, group, and weight.

Parameters

Outlier styling

ParameterDescription
outliersWhether to display (TRUE) or discard (FALSE) outliers. Use outlier.shape = NA to hide outliers while keeping axis limits at full data range.
outlier.colour / outlier.colorColour of outlier points. NULL inherits from data aesthetics.
outlier.fillFill colour for outlier points.
outlier.shapeShape of outlier points (see ?pch for values).
outlier.sizeSize of outlier points.
outlier.strokeStroke width of outlier points. Default is 0.5.
outlier.alphaAlpha (transparency) of outlier points.

Whisker and box styling

ParameterDescription
whisker.colour / whisker.colorColour of whisker lines.
whisker.linetypeLine type for whiskers.
whisker.linewidthLine width for whiskers.
box.colour / box.colorColour of the box.
box.linetypeLine type for the box outline.
box.linewidthLine width for the box outline.
median.colour / median.colorColour of the median line.
median.linetypeLine type for the median line.
median.linewidthLine width for the median line.
staple.colour / staple.colorColour of staple lines (ends of whiskers). Staples only appear when staplewidth is non-zero.
staple.linetypeLine type for staples.
staple.linewidthLine width for staples.

Plot structure parameters

ParameterDescription
notchFALSE (default) for standard box plot. TRUE for notched box plot, notches that don’t overlap suggest significantly different medians.
notchwidthWidth of the notch relative to the box body. Defaults to 0.5 (half the box width).
varwidthFALSE (default). TRUE draws boxes with widths proportional to the square root of the number of observations in each group.
staplewidthRelative width of staples to the box. Default 0 means no staples drawn. Set to a positive value to draw staple lines at whisker ends.
na.rmFALSE (default), missing values produce a warning. TRUE silently removes NAs.
orientationLayer orientation. NA (default) auto-determines from aesthetic mapping. Explicitly set to "x" or "y" if auto-detection fails.
inherit.aesTRUE (default), combines this layer’s mapping with the plot’s default mapping. FALSE to use only this layer’s mapping.

Basic example

library(ggplot2)

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot()

This draws one box per cylinder group. The box spans the interquartile range (IQR), the thick line inside is the median, the whiskers extend to 1.5 * IQR beyond the hinges, and points beyond that are plotted as individual outliers. You can also flip the coordinate system to create a horizontal layout, which often works better when category labels are long or when you want the viewer to compare groups along a shared vertical baseline.

Horizontal box plot

ggplot(mtcars, aes(x = mpg, y = factor(cyl))) +
  geom_boxplot()

Set orientation = "y" explicitly if needed, though ggplot2 usually detects the orientation correctly. Once orientation is settled, you can refine the visual appearance by adjusting fill colours, line widths, and the styling of individual box components. The next example shows how to customise the box outline, fill, and median line to make the plot match a presentation theme.

Styling the box

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(
    fill = "steelblue",
    colour = "#333333",
    box.linewidth = 1,
    median.colour = "red",
    median.linewidth = 1.5
  )

The colour and line-width parameters give you fine-grained control over every visual element of a box plot, from the box outline to the median stripe. Rather than relying solely on summary statistics, you may also want to overlay the raw observations so readers can assess the underlying sample size and spot unusual patterns that the box alone might hide.

Showing raw data points

Overlay jittered points to show the actual data:

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(outliers = FALSE) +
  geom_jitter(width = 0.2, alpha = 0.4)

Setting outliers = FALSE on the box plot and adding geom_jitter() on top is a cleaner look than showing both the box plot outliers and the jittered points. While jittered points reveal the data distribution, you might also want to assess whether differences between group medians are statistically meaningful. A notched box plot provides a rough visual test for this by drawing confidence intervals around each median.

Notched box plot

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(notch = TRUE, notchwidth = 0.6)

Notches narrow toward the median. The notch convention is roughly median ± 1.58 * IQR / sqrt(n). If two boxes have non-overlapping notches, their medians are likely significantly different. Another structural variation worth knowing is the variable-width box plot, which encodes the number of observations in each group directly into the width of the box. This adjustment is useful when group sizes are unbalanced and you want the viewer to see that at a glance.

Variable width boxes

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(varwidth = TRUE)

Boxes are drawn with widths proportional to sqrt(n) for each group, more observations means a wider box. Use this when group sizes vary considerably. Beyond the shape and width of the box itself, you can also control how far the whiskers reach using the coef argument of stat_boxplot. Adjusting this threshold changes which observations are flagged as outliers and which fall inside the whisker range.

Whiskers beyond the default range

The stat_boxplot stat controls how far whiskers extend:

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  stat_boxplot(geom = "errorbar", coef = 3) +  # extends to 3 * IQR
  geom_boxplot()

The coef argument controls how far the whiskers extend from the hinge (default is 1.5). Increase it for fewer outliers, decrease for more. In datasets where some observations carry greater importance than others, you can assign a weight to each row so that the quantile calculations within the box plot reflect those priorities. This is especially useful in survey data or when rows represent aggregated counts.

Weighted box plot

library(dplyr)

mtcars %>%
  mutate(weight = ifelse(cyl == 6, 2, 1)) %>%
  ggplot(aes(x = factor(cyl), y = mpg, weight = weight)) +
  geom_boxplot(varwidth = TRUE)

Use the weight aesthetic to produce weighted box plots. The varwidth = TRUE option respects weights when scaling box widths. Weighting aside, a common need in exploratory analysis is displaying a secondary categorical breakdown alongside the primary grouping variable. Mapping a second factor to the fill aesthetic and using a dodged position creates side-by-side box plots within each primary group, letting you compare distributions across two dimensions at once.

Grouped box plots with fill

ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(gear))) +
  geom_boxplot(position = "dodge2")

Use fill to create grouped box plots. position = "dodge2" places the boxes side by side. dodge2 is the default position for geom_boxplot and handles variable widths better than position = "dodge". For quick exploratory plots where you only need a compact side-by-side view without formal grouping variables, you can construct composite labels on the fly with paste() in the x aesthetic to achieve a similar compact layout.

Quick side-by-Side comparison

ggplot(mtcars, aes(x = paste(cyl, "cyl"), y = mpg)) +
  geom_boxplot(width = 0.4)

A quick trick for side-by-side boxes without a grouping variable: paste() the category values together in the x aesthetic.

See also