rguides

ggplot2::geom_point()

geom_point(mapping = NULL, data = NULL, stat = "identity", position = "identity", ...)

geom_point() draws a point for each observation. It’s the go-to geom for scatter plots, showing the relationship between two continuous variables, or the distribution of one variable against another.

Syntax

geom_point(mapping = NULL, data = NULL, stat = "identity", position = "identity", ...)

Most arguments are shared across all geoms. The ones you’ll use most:

ArgumentWhat it does
mappingAesthetic mappings from aes()
dataData frame for this layer (usually inherited from ggplot())
statStatistical transformation, almost always "identity"
positionPosition adjustment, "identity" by default, or "jitter" to spread overlapping points

Aesthetics

geom_point() supports all position aesthetics plus visual ones. The most common:

AestheticWhat it controls
xPosition on x-axis
yPosition on y-axis
colourPoint colour (continuous or categorical)
sizePoint size in mm
shapePoint shape (0–25)
alphaTransparency (0–1)
fillFill colour for shape 21–25
strokeBorder width for shapes 21–25

Map aesthetics to variables to encode data in the plot. Set aesthetics to constants to style the points:

# Mapped: colour varies with a variable
ggplot(mtcars, aes(x = wt, y = mpg, colour = cyl)) + geom_point()

# Set: all points are the same size
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(size = 3)

Mapping an aesthetic to a variable tells ggplot2 to vary point appearance according to data values, while setting an aesthetic to a constant applies the same visual property to every observation. Understanding this distinction is the foundation for building richer scatter plots that encode multiple dimensions of information in a single graphic.

Basic scatter plot

library(ggplot2)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point()

A simple two-variable scatter. The x-axis shows car weight, the y-axis shows miles per gallon. You can immediately see the negative relationship. With this basic plot as a starting point, you can layer on additional variables using colour, size, and shape aesthetics to show a third or even a fourth dimension without adding extra panels or facets.

Mapping additional variables

The power of geom_point() is encoding additional variables through aesthetics:

# Colour by a third variable
ggplot(mtcars, aes(x = wt, y = mpg, colour = hp)) +
  geom_point()

# Size and colour by different variables
ggplot(mtcars, aes(x = wt, y = mpg, size = hp, colour = factor(cyl))) +
  geom_point()

Continuous variables mapped to colour get a gradient scale. Categorical variables get a discrete colour palette. Use scale_colour_viridis_d() or similar for colourblind-safe palettes. Colour and size are the most commonly mapped aesthetics, but the point shape itself can also encode categorical information, giving you a third visual channel that stays distinguishable even when printed in greyscale.

Shape aesthetic

The shape aesthetic accepts integers 0–25. Different shapes have different capabilities:

# Use shape 21 (filled circle) to allow both colour AND fill
ggplot(mtcars, aes(x = wt, y = mpg, fill = factor(cyl), colour = factor(gear))) +
  geom_point(shape = 21, size = 3)

Shape 21–25 have a fill and a stroke (border colour). Shapes 0–20 either have no fill or no stroke, depending.

Overplotting and jitter

Overplotting happens when many points share the same coordinates, a common problem with discrete or categorical data. The fix is position_jitter:

# Default: points at exact coordinates
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_point()

# Jittered: small random offset spreads the points
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_point(position = position_jitter(width = 0.3, height = 0))

width is the maximum jitter in the x direction, height in the y direction. For a categorical x-axis, horizontal jitter (width) spreads points without changing their y position. The position_jitter() function gives you precise control over the amount of random offset, but if you just want a quick scatter with jittered points you can skip the verbose syntax and use the convenience wrapper instead.

geom_jitter() is a shorthand for geom_point(position = "jitter"):

# Equivalent to the jitter call above
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_jitter(width = 0.3, height = 0)

Jittering works well for moderate numbers of points that share coordinates, but when you are plotting tens of thousands of observations the points themselves can merge into an unreadable solid mass. In those cases, reducing the opacity of each point lets the density of overlap emerge naturally as darker regions on the plot surface.

Alpha for large data

When you have thousands of points, use alpha to reveal density:

ggplot(faithful, aes(x = eruptions, y = waiting)) +
  geom_point(alpha = 0.3)

alpha = 0.3 makes each point 30% opaque. Where points overlap, the density shows through as darker regions. Common values for large datasets: 0.1 to 0.5.

Size legend and scaling

When size is mapped to a continuous variable, ggplot2 creates a size legend showing the mapping:

ggplot(mtcars, aes(x = wt, y = mpg, size = hp)) +
  geom_point()

A size legend appears automatically when you map a continuous variable to the size aesthetic, but the default range of point sizes may not suit your data. The scale_size_continuous() function lets you set the minimum and maximum point radii, which is especially useful when the mapped variable has a wide range that would otherwise produce tiny or enormous dots. Control the legend appearance with scale_size_continuous():

ggplot(mtcars, aes(x = wt, y = mpg, size = hp)) +
  geom_point() +
  scale_size_continuous(range = c(1, 6))

range sets the minimum and maximum point sizes in mm. Once you have the points styled appropriately, a natural next step is to overlay a trend line that summarises the overall relationship in the data. The geom_smooth() function computes and draws a smoothed regression fit with a confidence band, turning a raw scatter plot into a more analytical graphic.

Combining with smoothers

A common pattern is points + a smoothing layer:

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth()

geom_smooth() adds a regression line with a confidence interval. method = "lm" forces a linear model; method = "loess" (default for small n) fits a local polynomial. When you colour both the points and the smoothed lines by the same grouping variable, ggplot2 automatically combines the two layers into a single unified legend, so the reader sees one colour key that applies to both the raw observations and the fitted trend.

Colour by group with legend merging

When multiple layers map the same aesthetic, ggplot2 merges legends intelligently:

ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl))) +
  geom_point(size = 2) +
  geom_smooth(se = FALSE, method = "lm")

The colour legend shows both the point colours and the line colours from geom_smooth().

Parameters

ParameterTypeDefaultDescription
mappingaestheticNULLAesthetic mappings from aes()
datadata.frameNULLLayer-specific data
statstring"identity"Leave data as-is
positionstring/position"identity"Identity or "jitter"
na.rmlogicalFALSERemove missing values silently
show.legendlogical/NANAShow legend for this layer
inherit.aeslogicalTRUEInherit aesthetics from ggplot()

Additional parameters passed through ... go to layer().

See also