ggplot2::geom_point()
geom_point(mapping = NULL, data = NULL, stat = "identity", position = "identity", ...) geom_point() draws a point for each observation. It’s the go-to geom for scatter plots, showing the relationship between two continuous variables, or the distribution of one variable against another.
Syntax
geom_point(mapping = NULL, data = NULL, stat = "identity", position = "identity", ...)
Most arguments are shared across all geoms. The ones you’ll use most:
| Argument | What it does |
|---|---|
mapping | Aesthetic mappings from aes() |
data | Data frame for this layer (usually inherited from ggplot()) |
stat | Statistical transformation, almost always "identity" |
position | Position adjustment, "identity" by default, or "jitter" to spread overlapping points |
Aesthetics
geom_point() supports all position aesthetics plus visual ones. The most common:
| Aesthetic | What it controls |
|---|---|
x | Position on x-axis |
y | Position on y-axis |
colour | Point colour (continuous or categorical) |
size | Point size in mm |
shape | Point shape (0–25) |
alpha | Transparency (0–1) |
fill | Fill colour for shape 21–25 |
stroke | Border width for shapes 21–25 |
Map aesthetics to variables to encode data in the plot. Set aesthetics to constants to style the points:
# Mapped: colour varies with a variable
ggplot(mtcars, aes(x = wt, y = mpg, colour = cyl)) + geom_point()
# Set: all points are the same size
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(size = 3)
Mapping an aesthetic to a variable tells ggplot2 to vary point appearance according to data values, while setting an aesthetic to a constant applies the same visual property to every observation. Understanding this distinction is the foundation for building richer scatter plots that encode multiple dimensions of information in a single graphic.
Basic scatter plot
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()
A simple two-variable scatter. The x-axis shows car weight, the y-axis shows miles per gallon. You can immediately see the negative relationship. With this basic plot as a starting point, you can layer on additional variables using colour, size, and shape aesthetics to show a third or even a fourth dimension without adding extra panels or facets.
Mapping additional variables
The power of geom_point() is encoding additional variables through aesthetics:
# Colour by a third variable
ggplot(mtcars, aes(x = wt, y = mpg, colour = hp)) +
geom_point()
# Size and colour by different variables
ggplot(mtcars, aes(x = wt, y = mpg, size = hp, colour = factor(cyl))) +
geom_point()
Continuous variables mapped to colour get a gradient scale. Categorical variables get a discrete colour palette. Use scale_colour_viridis_d() or similar for colourblind-safe palettes. Colour and size are the most commonly mapped aesthetics, but the point shape itself can also encode categorical information, giving you a third visual channel that stays distinguishable even when printed in greyscale.
Shape aesthetic
The shape aesthetic accepts integers 0–25. Different shapes have different capabilities:
# Use shape 21 (filled circle) to allow both colour AND fill
ggplot(mtcars, aes(x = wt, y = mpg, fill = factor(cyl), colour = factor(gear))) +
geom_point(shape = 21, size = 3)
Shape 21–25 have a fill and a stroke (border colour). Shapes 0–20 either have no fill or no stroke, depending.
Overplotting and jitter
Overplotting happens when many points share the same coordinates, a common problem with discrete or categorical data. The fix is position_jitter:
# Default: points at exact coordinates
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_point()
# Jittered: small random offset spreads the points
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_point(position = position_jitter(width = 0.3, height = 0))
width is the maximum jitter in the x direction, height in the y direction. For a categorical x-axis, horizontal jitter (width) spreads points without changing their y position. The position_jitter() function gives you precise control over the amount of random offset, but if you just want a quick scatter with jittered points you can skip the verbose syntax and use the convenience wrapper instead.
geom_jitter() is a shorthand for geom_point(position = "jitter"):
# Equivalent to the jitter call above
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_jitter(width = 0.3, height = 0)
Jittering works well for moderate numbers of points that share coordinates, but when you are plotting tens of thousands of observations the points themselves can merge into an unreadable solid mass. In those cases, reducing the opacity of each point lets the density of overlap emerge naturally as darker regions on the plot surface.
Alpha for large data
When you have thousands of points, use alpha to reveal density:
ggplot(faithful, aes(x = eruptions, y = waiting)) +
geom_point(alpha = 0.3)
alpha = 0.3 makes each point 30% opaque. Where points overlap, the density shows through as darker regions. Common values for large datasets: 0.1 to 0.5.
Size legend and scaling
When size is mapped to a continuous variable, ggplot2 creates a size legend showing the mapping:
ggplot(mtcars, aes(x = wt, y = mpg, size = hp)) +
geom_point()
A size legend appears automatically when you map a continuous variable to the size aesthetic, but the default range of point sizes may not suit your data. The scale_size_continuous() function lets you set the minimum and maximum point radii, which is especially useful when the mapped variable has a wide range that would otherwise produce tiny or enormous dots. Control the legend appearance with scale_size_continuous():
ggplot(mtcars, aes(x = wt, y = mpg, size = hp)) +
geom_point() +
scale_size_continuous(range = c(1, 6))
range sets the minimum and maximum point sizes in mm. Once you have the points styled appropriately, a natural next step is to overlay a trend line that summarises the overall relationship in the data. The geom_smooth() function computes and draws a smoothed regression fit with a confidence band, turning a raw scatter plot into a more analytical graphic.
Combining with smoothers
A common pattern is points + a smoothing layer:
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_smooth()
geom_smooth() adds a regression line with a confidence interval. method = "lm" forces a linear model; method = "loess" (default for small n) fits a local polynomial. When you colour both the points and the smoothed lines by the same grouping variable, ggplot2 automatically combines the two layers into a single unified legend, so the reader sees one colour key that applies to both the raw observations and the fitted trend.
Colour by group with legend merging
When multiple layers map the same aesthetic, ggplot2 merges legends intelligently:
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl))) +
geom_point(size = 2) +
geom_smooth(se = FALSE, method = "lm")
The colour legend shows both the point colours and the line colours from geom_smooth().
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
mapping | aesthetic | NULL | Aesthetic mappings from aes() |
data | data.frame | NULL | Layer-specific data |
stat | string | "identity" | Leave data as-is |
position | string/position | "identity" | Identity or "jitter" |
na.rm | logical | FALSE | Remove missing values silently |
show.legend | logical/NA | NA | Show legend for this layer |
inherit.aes | logical | TRUE | Inherit aesthetics from ggplot() |
Additional parameters passed through ... go to layer().
See also
- /tutorials/r-data-visualization/ggplot2-basics/, first steps with ggplot2
- /tutorials/r-data-visualization/introduction-to-ggplot2/, layered grammar of graphics concept
- /reference/tidyverse/ggplot2-aes/ — the aesthetic mapping system geom_point uses