Introduction to ggplot2
ggplot2 is R’s most powerful and popular data visualization package. Created by Hadley Wickham, it implements the grammar of graphics — a systematic way of building visualizations from reusable components. Once you understand the grammar, you can create virtually any visualization you can imagine.
Installing and Loading ggplot2
ggplot2 is part of the tidyverse, so you can install it individually or with the entire tidyverse:
# Install just ggplot2
install.packages("ggplot2")
# Or install the entire tidyverse
install.packages("tidyverse")
# Load the package
library(ggplot2)
# OR
library(tidyverse)
When you load tidyverse, ggplot2 loads automatically along with other essential packages like dplyr, tidyr, and tibble.
The Grammar of Graphics
The key to understanding ggplot2 is the grammar of graphics. Every plot consists of these core components:
- Data: The dataset you want to visualize
- Aesthetics (aes): Visual properties mapped to data (position, color, size, shape)
- Geometries (geom): The geometric shapes used to represent data (points, lines, bars)
- Facets: Subplots that split data by a categorical variable
- Themes: Visual styling (colors, fonts, backgrounds)
The basic template is:
ggplot(data, aes(x = x_var, y = y_var)) +
geom_type()
Your First ggplot
Let us create a simple scatter plot using the built-in mtcars dataset:
# Load ggplot2
library(ggplot2)
# Create a basic scatter plot
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()
This code does three things. First, ggplot initializes the plot with data and aesthetic mappings. Second, geom_point adds a layer of points to represent each observation. Third, the + operator combines layers together to build your visualization.
Adding Aesthetics
You can map additional variables to visual properties using aesthetics in your plot:
# Color points by number of cylinders
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
geom_point()
# Size points by horsepower
ggplot(mtcars, aes(x = wt, y = mpg, size = hp)) +
geom_point()
# Combine multiple aesthetics
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), size = hp)) +
geom_point()
Notice that we wrapped cyl in factor(). This treats the numeric cylinder count as categorical, giving you discrete colors instead of a continuous gradient when plotting.
Using Colors and Fills
Beyond mapping colors to variables, you can set static colors for all points in your visualization:
# Set a static color for all points
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "steelblue", size = 3)
# For filled shapes (bars, etc.), use fill instead
ggplot(mtcars, aes(x = factor(cyl))) +
geom_bar(fill = "coral")
You can also use hexadecimal color codes like #FF5733 or reference named colors from R is color palette.
Different Geometries
ggplot2 offers many geometry functions for different chart types. Each geometry expects specific aesthetic mappings to properly display your data.
Bar Charts
Bar charts are useful for showing counts or aggregated values across categories:
# Count cars by cylinder
ggplot(mtcars, aes(x = factor(cyl))) +
geom_bar()
# Horizontal bar chart
ggplot(mtcars, aes(x = factor(cyl))) +
geom_bar() +
coord_flip()
Histograms
Histograms display the distribution of a continuous variable by binning the data:
# Distribution of miles per gallon
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(binwidth = 5)
# Customize bin edges
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 10, fill = "white", color = "black")
Box Plots
Box plots show the distribution of data through quartiles and are excellent for comparing groups:
# MPG by number of cylinders
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot()
# Violin plots show distribution shape
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_violin()
Line Charts
Line charts are ideal for showing trends over continuous variables, particularly time series data:
# Create a simple time series using the built-in Orange dataset
ggplot(Orange, aes(x = age, y = circumference, color = Tree)) +
geom_line()
Customizing Labels and Titles
Make your plots informative with proper labeling so viewers understand what they are seeing:
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl))) +
geom_point() +
labs(
title = "Car Weight vs. Miles Per Gallon",
subtitle = "Data from 1974 Motor Trend Magazine",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon",
color = "Cylinders"
) +
theme_minimal()
The labs() function customizes all labels in your plot, while theme_minimal() gives you a clean, modern look without distracting grid lines.
Adding Smooth Curves
When you want to see the overall trend in your data, add a smoothing layer to reveal patterns:
# Scatter plot with a smoothed trend line
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_smooth(method = "lm")
The geom_smooth() function automatically calculates and displays a trend. The method = lm argument fits a linear model. Other options include loess for local regression and gam for generalized additive models.
Facets: Multiple Plots at Once
Facets let you create multiple subplots split by a categorical variable, making comparisons easy:
# Separate plots for each cylinder count
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
facet_wrap(~factor(cyl))
This creates three separate plots, one for each cylinder count. This approach is perfect for comparing groups side by side.
Saving Your Plots
Once you have created a plot, you will often want to save it to a file for reports or presentations:
# Save the last plot to a PNG file
ggsave("my-plot.png", width = 6, height = 4, dpi = 300)
# Save to PDF for high-quality printing
ggsave("my-plot.pdf", width = 6, height = 4)
# Save a specific plot object to a file
my_plot <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
ggsave("scatter-plot.png", plot = my_plot, width = 6, height = 4)
The ggsave() function automatically detects the file format from the extension and lets you control dimensions, resolution, and many other parameters.
Summary
ggplot2 transforms data visualization from a frustrating chore into a creative, systematic process. The key concepts to remember are as follows.
Start with ggplot() and specify your data and aesthetic mappings. Add layers with geom_*() functions for different visual representations. Customize with themes, labels, and facets for polished results. Save with ggsave() to export your visualizations.
In the next tutorial, we will dive deeper into customizing ggplot2 charts with colors, themes, and annotations to make your visualizations truly stand out.