ggplot2 Basics

· 6 min read · Updated March 27, 2026 · beginner
r ggplot2 data-visualization tidyverse

The Grammar of Graphics

ggplot2 is an R package for building data visualizations from composable components. The name “ggplot2” comes from the “Grammar of Graphics” — a system where plots are built by stacking independent layers, each controlling a different aspect of how data maps to visual elements.

The key advantage of this approach is that simple plots and complex ones use the same underlying logic. You start with a base layer, add geometry, and layer on refinements. Everything composes with the + operator.

If you have used base R’s plot() function, ggplot2 will feel different at first. Base R has a function for every chart type (plot(), hist(), barplot()). ggplot2 has one function (ggplot()) and many geom functions you stack on top. The ggplot2 approach pays off once you need to customize beyond the defaults.

Three Things Every Plot Needs

Every ggplot2 plot has three required components:

  1. Data — a data frame or tibble
  2. Aesthetic mappings — how variables map to visual properties
  3. A geom — the geometric object that draws the data

The function ggplot() sets up the first two. You add geoms with +.

# Minimal complete plot
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()
```r

`ggplot()` takes a data frame as its first argument. The `aes()` function (short for aesthetic mapping) defines which columns map to the x-axis, y-axis, color, size, and other visual properties.

## Aesthetic Mappings with `aes()`

The `aes()` function is where you connect data columns to visual channels. The most common aesthetics are:

| Aesthetic | What it controls |
|-----------|-----------------|
| `x` | Horizontal position |
| `y` | Vertical position |
| `color` | Point and line color |
| `fill` | Interior fill color (bars, boxes) |
| `size` | Point size |
| `shape` | Point shape (categorical) |
| `alpha` | Transparency |

A critical distinction separates aesthetics defined *inside* `aes()` from those defined *outside*. Inside `aes()`, each aesthetic maps to a data variable, so its value varies by group or observation. Outside `aes()`, the aesthetic gets a fixed value that applies to the entire layer.

```r
# Inside aes() — value varies by data
ggplot(mpg, aes(x = displ, y = hwy, color = class)) + geom_point()

# Outside aes() — fixed value for all points
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(color = "navy")
```r

## Scatter Plots with `geom_point()`

A scatter plot shows the relationship between two continuous variables. You map engine displacement to the x-axis and highway fuel economy to the y-axis, and each car in the dataset appears as a point.

```r
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()
```r

Adding a color aesthetic splits the points by category. Here each point's color reflects the vehicle class:

```r
ggplot(mpg, aes(x = displ, y = hwy, color = class)) + geom_point()
```r

ggplot2 automatically adds a legend. No extra code required.

## Line Plots with `geom_line()`

Use `geom_line()` when the x-axis represents an ordered sequence, such as time:

```r
ggplot(economics, aes(x = date, y = pop)) + geom_line()
```r

This shows how the US population changed over time. Line plots work best when the x-axis has a meaningful order — do not use them for categorical comparisons.

## Bar Charts with `geom_bar()`

`geom_bar()` has two different behaviors depending on how you use it.

With only an x aesthetic, `geom_bar()` counts the number of rows per category and draws a bar for each count:

```r
ggplot(mpg, aes(x = class)) + geom_bar()
```r

ggplot2 counted the rows for you. Each bar represents one vehicle class, and its height reflects how many vehicles fall into that category. This is the default behavior of `geom_bar()` — no manual counting required.

You can also provide pre-aggregated data. If you already have summary values, pass `y` inside `aes()` and add `stat = "identity"`:

```r
ggplot(diamonds, aes(x = cut, y = carat, fill = color)) +
  geom_bar(stat = "identity")
```r

## Histograms and Box Plots

A histogram divides a continuous variable into bins and shows the count per bin. The choice of `binwidth` matters — different widths reveal different patterns in your data:

```r
ggplot(mpg, aes(x = hwy)) +
  geom_histogram(binwidth = 2)
```r

A box plot is better suited for comparing distributions across categories. It shows the median, quartiles, and potential outliers in a compact form:

```r
ggplot(mpg, aes(x = drv, y = hwy)) +
  geom_boxplot()
```r

## Labels and Titles

Add a complete labeling system with `labs()`, which handles the title, subtitle, axis labels, and caption in one call:

```r
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  labs(
    title = "Engine Displacement vs Highway Fuel Economy",
    subtitle = "Data from 1999 to 2008 vehicles",
    x = "Engine Displacement (L)",
    y = "Highway MPG",
    caption = "Source: fueleconomy.gov"
  )
```r

## A Common Mistake: Vectors in ggplot()

ggplot2 requires data in a data frame or tibble. If you try passing individual vectors, it will not work:

```r
# This will NOT work
x <- c(1, 2, 3)
y <- c(4, 5, 6)
ggplot(aes(x = x, y = y)) + geom_point()

# You need to wrap them in a data frame first
ggplot(data.frame(x, y), aes(x = x, y = y)) + geom_point()
```r

This error trips up a lot of people coming from base R, where you could plot vectors directly.

## Piping Data into ggplot()

The tidyverse pipe `%>%` lets you chain data transformations directly into your visualization. You filter or mutate your data first, then pipe the result straight into `ggplot()`:

```r
library(dplyr)
library(ggplot2)

mpg %>%
  filter(class == "compact") %>%
  ggplot(aes(x = displ, y = hwy, color = drv)) +
  geom_point(size = 3)
```r

The key insight is that `ggplot()` receives the tibble from the pipe. Data preparation and visualization live in one readable chain.

## Conclusion

ggplot2 builds visualizations by combining data, aesthetic mappings, and geometric layers through a consistent `+` operator interface. The three foundational pieces are your data frame, the `aes()` mappings that connect variables to visual properties, and the `geom_` functions that determine how those aesthetics appear.

Once you understand this layered approach, you can create anything from quick exploratory charts to polished publication graphics. The same mental model scales from a simple scatter plot to a multi-layer faceted figure.

To move beyond the basics, explore how to layer multiple geoms on the same plot, how facets split your data into subplots, and how themes control the non-data elements of your charts.

## See Also

- [/tutorials/ggplot2-advanced-geoms](/tutorials/ggplot2-advanced-geoms) — Layer multiple geoms and use statistical transformations
- [/tutorials/ggplot2-facets-and-themes](/tutorials/ggplot2-facets-and-themes) — Split data into subplots and customize the visual appearance
- [/tutorials/tidyverse-intro](/tutorials/tidyverse-intro) — Understand the tidyverse ecosystem ggplot2 belongs to