rguides

How to Create Scatter Plots with ggplot2 in R

Scatter plots display the relationship between two continuous variables. To create scatter plots in ggplot2, use geom_point() to map data columns to x and y coordinates, with optional aesthetics for color, size, and transparency. ggplot2 handles axis scaling, legends, and default themes automatically once the mappings are set.

library(ggplot2)

# Basic scatter: weight vs fuel efficiency
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point()

# Color by group and size by horsepower
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), size = hp)) +
  geom_point(alpha = 0.7) +
  labs(
    title = "Car Weight vs Fuel Efficiency",
    x = "Weight (1000 lbs)",
    y = "Miles per Gallon",
    color = "Cylinders"
  )

Set aesthetics inside aes() to map them to data columns; set them outside to apply a fixed value (e.g., geom_point(size = 3) gives every point the same size). Use alpha to reduce overplotting in dense regions, and overlay geom_smooth(method = "lm") for a linear trend line or method = "loess" for local smoothing.

# Linear trend line
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

# Facet by cylinder count
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  facet_wrap(~ factor(cyl))

facet_wrap() splits the plot into side-by-side panels by a categorical variable. facet_grid() creates a grid when you have two faceting variables. Both keep axis scales consistent across panels by default. For scatter plots with thousands of points, geom_point(alpha = 0.3) reduces overplotting by making overlapping points semi-transparent. As an alternative, geom_bin2d() or geom_hex() (from the hexbin package) summarise dense regions into rectangular or hexagonal bins coloured by count.

See also