ggplot2::geom_histogram()
geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ...) geom_histogram() divides a continuous variable into bins and draws bars whose height reflects the number of observations in each bin. It’s the standard tool for visualising the distribution of a numeric variable, showing where data clusters, where it’s sparse, and whether the distribution is skewed or symmetric.
Syntax
geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ...)
| Argument | What it does |
|---|---|
mapping | Aesthetic mappings from aes() |
data | Data frame for this layer |
stat | "bin" (default), controls how binning is applied |
position | "stack" (default), "dodge", "fill", or "identity" |
bins | Number of bins (alternative to binwidth) |
binwidth | Width of each bin in data units |
center | Centre of each bin |
boundary | Boundary of the first bin |
na.rm | Remove missing values silently |
Basic usage
library(ggplot2)
# Simple histogram of car fuel efficiency
ggplot(mtcars, aes(x = mpg)) +
geom_histogram()
geom_histogram() chooses a default number of bins (30 by default) and draws the distribution of mpg as coloured bars. The height of each bar reflects how many cars fall within that bin of miles-per-gallon. That default works as a starting point, but the appearance and the story the histogram tells can shift dramatically depending on how you partition the data into bins. Choosing binning parameters carefully is often the difference between a clear visual summary and a misleading one.
Choosing the right number of bins
Too few bins obscure detail; too many create noise. Use bins for an exact count or binwidth to control the range each bin covers:
# Exactly 10 bins
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 10)
# Each bin covers 5 units of mpg
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 5)
# Narrow bins — more detail
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 2)
binwidth is often more interpretable than bins, a binwidth = 5 on a temperature scale means the same thing regardless of the data range. Once you settle on a binning strategy, you will likely want to adjust the visual styling of the bars themselves — the fill colour, the border colour, and whether the bars are stacked or grouped by a second categorical variable.
Filling and colouring bars
Control the bar appearance with fill (interior colour) and colour (border):
# Solid teal bars with dark border
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(fill = "steelblue", colour = "white")
# Fill mapped to a variable — creates a stacked histogram
ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +
geom_histogram()
Mapping fill to a categorical variable stacks the bars by group within each bin, showing how each group’s distribution contributes to the overall count. Stacked histograms can be informative, but when groups overlap heavily it becomes hard to compare the shapes of their individual distributions. For that scenario you can overlay the histograms directly with transparency, letting each group’s pattern remain visible while preserving the sense of overlap.
Overlay multiple histograms
For comparing distributions, position = "identity" places histograms directly on top of each other with transparency:
# Overlay histograms by cylinder count
ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +
geom_histogram(position = "identity", alpha = 0.6, bins = 15)
alpha = 0.6 makes the bars semi-transparent so overlapping regions are visible. position = "identity" prevents stacking so you can see each group clearly. When you are comparing three or more distributions, the bars themselves can start to feel visually heavy and obscure the very patterns you want to highlight. A frequency polygon removes the filled bars entirely and connects the bin counts with lines, giving you a lighter alternative that scales better across many groups.
Frequency polygon
A frequency polygon connects the bin heights with a line, useful for comparing multiple distributions without the visual weight of bars:
# Frequency polygon instead of bars
ggplot(mtcars, aes(x = mpg, colour = factor(cyl))) +
geom_freqpoly(bins = 15)
geom_freqpoly() uses the same binning as geom_histogram() but draws a line through the bin centres rather than drawing bars. Frequency polygons shine when you have several overlapping distributions and need to keep the plot readable, because lines do not obscure one another the way filled bars can. If you return to the standard histogram, you can fine-tune where bins begin and end to align the bars with meaningful reference points on the data axis.
Adjusting bin boundaries
Control exactly where bins start and end with boundary and center:
# Bins start at multiples of 5
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 5, boundary = 5)
# Bins centred on round numbers
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 5, center = 0)
boundary sets the edge of the first bin. center sets the centre point of the bins, which can make boundaries fall on round numbers automatically. These parameters are subtle but can make a noticeable difference when you align bin edges with natural breakpoints in the data, such as multiples of ten or midnight boundaries in time series. With all of these histogram controls in hand, it is worth comparing the histogram approach to two related geoms — the frequency polygon and the density curve — so you can choose the right tool for a given distributional question.
geom_histogram vs geom_freqpoly vs geom_density
Three geoms for showing distribution:
# Histogram (bars)
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 15)
# Frequency polygon (line)
ggplot(mtcars, aes(x = mpg)) +
geom_freqpoly(bins = 15)
# Density estimate (smoothed area)
ggplot(mtcars, aes(x = mpg)) +
geom_density()
geom_histogram() shows actual counts. geom_freqpoly() does the same but as a line, easier to overlay multiple groups. geom_density() fits a smoothed kernel density estimate, best for continuous distributions with many data points.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
mapping | aesthetic | NULL | Aesthetic mappings |
data | data.frame | NULL | Layer data |
stat | string | "bin" | Statistical transformation |
position | string | "stack" | "stack", "dodge", "fill", "identity" |
bins | integer | 30 | Number of bins |
binwidth | numeric | NULL | Width of bins in data units |
center | numeric | NULL | Centre of bins |
boundary | numeric | NULL | Boundary of first bin |
na.rm | logical | FALSE | Skip missing values silently |
show.legend | logical/NA | NA | Show in legend |
inherit.aes | logical | TRUE | Inherit global aesthetics |
See also
- /reference/tidyverse/ggplot2-geom-bar/, bar charts for categorical data (count or value per category)
- /reference/tidyverse/ggplot2-geom-line/, line charts for sequential or time-series data
- /reference/tidyverse/ggplot2-geom-point/, scatter plots for continuous vs continuous data
- /reference/tidyverse/ggplot2-aes/ — aesthetic mapping system shared by all ggplot2 geoms