rguides

ggplot2::geom_histogram()

geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ...)

geom_histogram() divides a continuous variable into bins and draws bars whose height reflects the number of observations in each bin. It’s the standard tool for visualising the distribution of a numeric variable — showing where data clusters, where it’s sparse, and whether the distribution is skewed or symmetric.

Syntax

geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ...)
ArgumentWhat it does
mappingAesthetic mappings from aes()
dataData frame for this layer
stat"bin" (default) — controls how binning is applied
position"stack" (default), "dodge", "fill", or "identity"
binsNumber of bins (alternative to binwidth)
binwidthWidth of each bin in data units
centerCentre of each bin
boundaryBoundary of the first bin
na.rmRemove missing values silently

Basic Usage

library(ggplot2)

# Simple histogram of car fuel efficiency
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram()

geom_histogram() chooses a default number of bins (30 by default) and draws the distribution of mpg as coloured bars. The height of each bar reflects how many cars fall within that bin of miles-per-gallon.

Choosing the Right Number of Bins

Too few bins obscure detail; too many create noise. Use bins for an exact count or binwidth to control the range each bin covers:

# Exactly 10 bins
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(bins = 10)

# Each bin covers 5 units of mpg
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 5)

# Narrow bins — more detail
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2)

binwidth is often more interpretable than bins — a binwidth = 5 on a temperature scale means the same thing regardless of the data range.

Filling and Colouring Bars

Control the bar appearance with fill (interior colour) and colour (border):

# Solid teal bars with dark border
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(fill = "steelblue", colour = "white")

# Fill mapped to a variable — creates a stacked histogram
ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +
  geom_histogram()

Mapping fill to a categorical variable stacks the bars by group within each bin, showing how each group’s distribution contributes to the overall count.

Overlay Multiple Histograms

For comparing distributions, position = "identity" places histograms directly on top of each other with transparency:

# Overlay histograms by cylinder count
ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +
  geom_histogram(position = "identity", alpha = 0.6, bins = 15)

alpha = 0.6 makes the bars semi-transparent so overlapping regions are visible. position = "identity" prevents stacking so you can see each group clearly.

Frequency Polygon

A frequency polygon connects the bin heights with a line — useful for comparing multiple distributions without the visual weight of bars:

# Frequency polygon instead of bars
ggplot(mtcars, aes(x = mpg, colour = factor(cyl))) +
  geom_freqpoly(bins = 15)

geom_freqpoly() uses the same binning as geom_histogram() but draws a line through the bin centres rather than drawing bars.

Adjusting Bin Boundaries

Control exactly where bins start and end with boundary and center:

# Bins start at multiples of 5
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 5, boundary = 5)

# Bins centred on round numbers
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 5, center = 0)

boundary sets the edge of the first bin. center sets the centre point of the bins, which can make boundaries fall on round numbers automatically.

geom_histogram vs geom_freqpoly vs geom_density

Three geoms for showing distribution:

# Histogram (bars)
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(bins = 15)

# Frequency polygon (line)
ggplot(mtcars, aes(x = mpg)) +
  geom_freqpoly(bins = 15)

# Density estimate (smoothed area)
ggplot(mtcars, aes(x = mpg)) +
  geom_density()

geom_histogram() shows actual counts. geom_freqpoly() does the same but as a line — easier to overlay multiple groups. geom_density() fits a smoothed kernel density estimate — best for continuous distributions with many data points.

Parameters

ParameterTypeDefaultDescription
mappingaestheticNULLAesthetic mappings
datadata.frameNULLLayer data
statstring"bin"Statistical transformation
positionstring"stack""stack", "dodge", "fill", "identity"
binsinteger30Number of bins
binwidthnumericNULLWidth of bins in data units
centernumericNULLCentre of bins
boundarynumericNULLBoundary of first bin
na.rmlogicalFALSESkip missing values silently
show.legendlogical/NANAShow in legend
inherit.aeslogicalTRUEInherit global aesthetics

See Also