ggplot2::geom_histogram()
geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ...) geom_histogram() divides a continuous variable into bins and draws bars whose height reflects the number of observations in each bin. It’s the standard tool for visualising the distribution of a numeric variable — showing where data clusters, where it’s sparse, and whether the distribution is skewed or symmetric.
Syntax
geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ...)
| Argument | What it does |
|---|---|
mapping | Aesthetic mappings from aes() |
data | Data frame for this layer |
stat | "bin" (default) — controls how binning is applied |
position | "stack" (default), "dodge", "fill", or "identity" |
bins | Number of bins (alternative to binwidth) |
binwidth | Width of each bin in data units |
center | Centre of each bin |
boundary | Boundary of the first bin |
na.rm | Remove missing values silently |
Basic Usage
library(ggplot2)
# Simple histogram of car fuel efficiency
ggplot(mtcars, aes(x = mpg)) +
geom_histogram()
geom_histogram() chooses a default number of bins (30 by default) and draws the distribution of mpg as coloured bars. The height of each bar reflects how many cars fall within that bin of miles-per-gallon.
Choosing the Right Number of Bins
Too few bins obscure detail; too many create noise. Use bins for an exact count or binwidth to control the range each bin covers:
# Exactly 10 bins
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 10)
# Each bin covers 5 units of mpg
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 5)
# Narrow bins — more detail
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 2)
binwidth is often more interpretable than bins — a binwidth = 5 on a temperature scale means the same thing regardless of the data range.
Filling and Colouring Bars
Control the bar appearance with fill (interior colour) and colour (border):
# Solid teal bars with dark border
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(fill = "steelblue", colour = "white")
# Fill mapped to a variable — creates a stacked histogram
ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +
geom_histogram()
Mapping fill to a categorical variable stacks the bars by group within each bin, showing how each group’s distribution contributes to the overall count.
Overlay Multiple Histograms
For comparing distributions, position = "identity" places histograms directly on top of each other with transparency:
# Overlay histograms by cylinder count
ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +
geom_histogram(position = "identity", alpha = 0.6, bins = 15)
alpha = 0.6 makes the bars semi-transparent so overlapping regions are visible. position = "identity" prevents stacking so you can see each group clearly.
Frequency Polygon
A frequency polygon connects the bin heights with a line — useful for comparing multiple distributions without the visual weight of bars:
# Frequency polygon instead of bars
ggplot(mtcars, aes(x = mpg, colour = factor(cyl))) +
geom_freqpoly(bins = 15)
geom_freqpoly() uses the same binning as geom_histogram() but draws a line through the bin centres rather than drawing bars.
Adjusting Bin Boundaries
Control exactly where bins start and end with boundary and center:
# Bins start at multiples of 5
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 5, boundary = 5)
# Bins centred on round numbers
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 5, center = 0)
boundary sets the edge of the first bin. center sets the centre point of the bins, which can make boundaries fall on round numbers automatically.
geom_histogram vs geom_freqpoly vs geom_density
Three geoms for showing distribution:
# Histogram (bars)
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 15)
# Frequency polygon (line)
ggplot(mtcars, aes(x = mpg)) +
geom_freqpoly(bins = 15)
# Density estimate (smoothed area)
ggplot(mtcars, aes(x = mpg)) +
geom_density()
geom_histogram() shows actual counts. geom_freqpoly() does the same but as a line — easier to overlay multiple groups. geom_density() fits a smoothed kernel density estimate — best for continuous distributions with many data points.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
mapping | aesthetic | NULL | Aesthetic mappings |
data | data.frame | NULL | Layer data |
stat | string | "bin" | Statistical transformation |
position | string | "stack" | "stack", "dodge", "fill", "identity" |
bins | integer | 30 | Number of bins |
binwidth | numeric | NULL | Width of bins in data units |
center | numeric | NULL | Centre of bins |
boundary | numeric | NULL | Boundary of first bin |
na.rm | logical | FALSE | Skip missing values silently |
show.legend | logical/NA | NA | Show in legend |
inherit.aes | logical | TRUE | Inherit global aesthetics |
See Also
- /reference/tidyverse/ggplot2_geom_bar/ — bar charts for categorical data (count or value per category)
- /reference/tidyverse/ggplot2_geom_line/ — line charts for sequential or time-series data
- /reference/tidyverse/ggplot2_geom_point/ — scatter plots for continuous vs continuous data
- /reference/tidyverse/ggplot2_aes/ — aesthetic mapping system shared by all ggplot2 geoms