Advanced Geoms in ggplot2
In this tutorial, you’ll learn how to create sophisticated visualizations using advanced ggplot2 geometries. Building on the fundamentals covered in our introduction to ggplot2, we’ll explore box plots, violin plots, density visualizations, and multi-layer charts that reveal patterns invisible in basic scatter plots.
Box Plots: Summarizing Distributions
Box plots (box-and-whisker plots) provide a compact way to compare distributions across categories. They show the median, quartiles, and potential outliers at a glance.
library(ggplot2)
library(dplyr)
# Use the mpg dataset for examples
glimpse(mpg)
#> Rows: 234
#> Columns: 11
#> $ manufacturer: chr "audi" "audi" "audi" ...
#> $ model : chr "a4" "a4" "a4" ...
#> $ displ : num 1.8 1.8 2 2 2.2 2.2 2.5 2.5 2.5 2.5 ...
#> $ year : int 1999 1999 1999 1999 1999 1999 ...
#> $ cyl : int 4 4 4 4 4 4 4 4 4 4 ...
#> $ trans : chr "auto(l5)" "manual(m5)" "manual(m5)" ...
#> $ drv : chr "f" "f" "f" "f" "f" ...
#> $ cty : int 18 21 20 21 22 23 24 25 26 26 ...
#> $ hwy : int 25 28 27 28 29 31 32 31 30 29 ...
#> $ fl : chr "p" "p" "p" "p" "p" ...
#> $ class : chr "compact" "compact" "compact" ...
# Basic box plot: highway mpg by number of cylinders
ggplot(mpg, aes(x = factor(cyl), y = hwy)) +
geom_boxplot() +
labs(
title = "Highway MPG by Cylinder Count",
x = "Number of Cylinders",
y = "Highway Miles Per Gallon"
) +
theme_minimal()
The box represents the interquartile range (IQR) — the middle 50% of your data. The line inside is the median. Whiskers extend to 1.5 × IQR, and points beyond are potential outliers.
Horizontal Box Plots and Customization
Rotate your box plot for better label readability or add notchs to compare medians:
# Horizontal box plot with notches
ggplot(mpg, aes(y = factor(cyl), x = hwy)) +
geom_boxplot(notch = TRUE, fill = "steelblue", alpha = 0.7) +
labs(
title = "Highway MPG by Cylinders (Horizontal)",
y = "Number of Cylinders",
x = "Highway Miles Per Gallon"
) +
theme_minimal()
Notches that don’t overlap between boxes suggest statistically different medians.
Violin Plots: Distribution Density
Violin plots combine box plots with kernel density estimation, revealing the full shape of distributions — bimodal, skewed, or uniform — that box plots obscure.
# Violin plot: highway mpg by vehicle class
ggplot(mpg, aes(x = class, y = hwy)) +
geom_violin(fill = "coral", alpha = 0.7) +
labs(
title = "Highway MPG Distribution by Vehicle Class",
x = "Vehicle Class",
y = "Highway MPG"
) +
theme_minimal() +
coord_flip()
Combining Violin with Box Plots
Overlay a box plot inside the violin for the best of both worlds:
# Violin + box plot combination
ggplot(mpg, aes(x = class, y = hwy, fill = class)) +
geom_violin(alpha = 0.5) +
geom_boxplot(width = 0.2, fill = "white", alpha = 0.8) +
labs(
title = "Highway MPG: Distribution Shape + Summary",
x = "Vehicle Class",
y = "Highway MPG"
) +
theme_minimal() +
theme(legend.position = "none")
Density Plots: Smooth Distribution Visualization
For continuous variables, density plots show probability density without binning artifacts:
# Overlapping density plots by category
ggplot(mpg, aes(x = hwy, fill = drv)) +
geom_density(alpha = 0.5) +
labs(
title = "Highway MPG Distribution by Drive Type",
x = "Highway MPG",
fill = "Drive Type"
) +
theme_minimal()
2D Density Plots
Visualize the relationship between two continuous variables with contour or raster density:
# 2D density contour
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_density_2d() +
labs(
title = "Engine Displacement vs Highway MPG",
x = "Engine Displacement (L)",
y = "Highway MPG"
) +
theme_minimal()
# Or use filled contours for easier reading
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_density_2d_filled() +
theme_minimal()
Jitter Plots: Reducing Overplotting
When you have many points, jittering adds random noise to reduce overlap:
# Jitter plot with median line
ggplot(mpg, aes(x = class, y = hwy, color = class)) +
geom_jitter(alpha = 0.6, width = 0.2) +
stat_summary(fun = median, geom = "line", color = "black",
aes(group = 1), linewidth = 1) +
theme_minimal() +
theme(legend.position = "none")
Scatter Plots with Smoothing
The geom_smooth() layer adds trend lines with confidence intervals:
# Scatter with smoothing splines
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "loess", se = TRUE, color = "red") +
labs(
title = "Engine Size vs Highway MPG with Trend",
x = "Engine Displacement (L)",
y = "Highway MPG"
) +
theme_minimal()
Choose methods: "lm" for linear, "loess" for local regression, "gam" for generalized additive models.
Multi-Layer Visualizations
Combine multiple geoms for rich, informative graphics:
# Publication-ready multi-layer plot
ggplot(mpg, aes(x = displ, y = hwy, color = drv, shape = factor(cyl))) +
geom_point(size = 2, alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
facet_wrap(~class, ncol = 3) +
labs(
title = "Highway MPG Analysis by Class",
subtitle = "Engine displacement vs highway mileage",
x = "Engine Displacement (L)",
y = "Highway MPG",
color = "Drive Type",
shape = "Cylinders"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "bottom"
)
Error Bars: Showing Uncertainty
Add error bars to bar charts or point plots to communicate variability:
# Bar chart with error bars
summary_mpg <- mpg %>%
group_by(class) %>%
summarise(
mean_hwy = mean(hwy),
sd_hwy = sd(hwy),
n = n(),
se = sd_hwy / sqrt(n)
)
ggplot(summary_mpg, aes(x = class, y = mean_hwy, fill = class)) +
geom_bar(stat = "identity", alpha = 0.7) +
geom_errorbar(
aes(ymin = mean_hwy - se, ymax = mean_hwy + se),
width = 0.3
) +
labs(
title = "Mean Highway MPG by Class with Standard Error",
x = "Vehicle Class",
y = "Mean Highway MPG"
) +
theme_minimal() +
theme(legend.position = "none", axis.text.x = element_text(angle = 45))
Key Takeaways
- Box plots reveal median, quartiles, and outliers — ideal for comparing categories
- Violin plots show distribution shape, especially useful for bimodal data
- Density plots visualize continuous distributions, with 2D versions for bivariate analysis
- Jitter plots reduce overplotting while showing individual points
- geom_smooth() adds trend lines with confidence intervals
- Combine geoms for maximum insight — violin + box, points + smooth
Master these advanced geometries, and you’ll communicate data patterns that basic charts cannot reveal.