Base R Plotting: Scatter, Line, Bar, and Box Plots
R’s base graphics system gives you a powerful toolkit for creating plots without needing external packages — over 30 plot types are available in base R alone, from scatterplots to mosaic plots, all without installing a single additional package. The graphics engine has been refined over decades, making it reliable and flexible. This guide walks through the most useful plotting functions and shows how to customize your output.
Base R’s plot() function
The workhorse of base R graphics is plot(). It is a generic function that adapts its behavior based on the data type you provide.
# Simple scatter plot
x <- 1:10
y <- c(2, 4, 3, 7, 6, 9, 8, 12, 11, 15)
plot(x, y)
The function automatically chooses sensible defaults for point type, axis labels, and margins based on the data range, but these are rarely suitable for presentations or reports. The second call to plot() overrides every visual property using named arguments, turning the bare exploratory output into a polished, self-documenting figure. Adding a descriptive title with main, labeling both axes meaningfully with xlab and ylab, changing point shape with pch = 19 for filled circles, and scaling point size with cex = 1.5 produces a plot that communicates the relationship clearly without requiring the viewer to read surrounding text for context:
plot(x, y,
main = "My First Plot",
xlab = "X Axis Label",
ylab = "Y Axis Label",
col = "steelblue",
pch = 19,
cex = 1.5)
Key parameters:
- main, plot title
- xlab, ylab, axis labels
- col, color (use hex codes or named colors)
- pch, point type (1-25, where 19 is filled circles)
- cex, size multiplier
You can also plot directly from a data frame:
plot(mtcars$wt, mtcars$mpg,
xlab = "Weight (1000 lbs)",
ylab = "Miles per Gallon",
main = "Car Weight vs. Fuel Efficiency")
Line plots
When your x variable has a natural ordering, like time steps, dosage levels, or sequential measurements, connecting the points with lines reveals trends and trajectories that a scatter plot alone obscures. The type = "l" argument in plot() draws line segments between consecutive data points in the order they appear, so you must ensure your data is sorted by the x variable beforehand. This plot type is the base R equivalent of a line chart and is particularly effective for time series data where the rate and direction of change between observations matters more than the individual point values:
# Time series data
time <- 1:20
values <- cumsum(rnorm(20))
plot(time, values,
type = "l",
col = "darkred",
lwd = 2)
Setting type = "b" combines points and lines in a single plot, drawing both symbols at each data coordinate and line segments connecting consecutive observations. This dual display is useful when individual observations carry meaning, such as monthly sales figures where each point represents actual revenue, and you still want to communicate the overall trend clearly. The filled-circle markers from pch = 16 draw attention to each measurement while the connecting lines guide the eye along the sequence, making this more informative than a plain line chart for sparse datasets where individual values are as important as the trajectory between them:
plot(time, values,
type = "b",
pch = 16,
col = "darkred")
Histograms with hist()
A histogram bins a continuous numeric variable into equal-width intervals and counts how many observations fall into each bin, revealing the shape, center, spread, and any skewness or multimodality in your data. Unlike a scatter plot that shows individual x-y pairs, a histogram summarizes an entire variable in one view, making it the go-to diagnostic for understanding distributions before running statistical tests or building models. The hist() function takes a numeric vector and produces a frequency histogram with sensible defaults, including automatic bin-width selection based on the data range:
# Simple histogram
hist(mtcars$mpg)
# More control
hist(mtcars$mpg,
breaks = 10,
col = "steelblue",
border = "white",
main = "Distribution of MPG",
xlab = "Miles per Gallon")
The breaks argument controls the number of bins, and the choice directly shapes how the distribution appears: too few bins smooth away important features like small gaps or secondary modes, while too many bins introduce noise that makes the overall shape harder to read. The second call also demonstrates col for bar fill color and border = "white" for separating adjacent bars visually, which improves readability when bars share the same fill color and might otherwise blur together.
Bar charts with barplot()
Bar charts display categorical data by mapping each category to a bar whose height represents a numeric value such as count, sum, or average. Unlike histograms, which bin a continuous variable, bar charts compare discrete groups and are the standard tool for presenting survey results, sales by product line, or any breakdown where the x-axis represents unordered or ordered categories:
# Simple bar chart
categories <- c("A", "B", "C", "D")
values <- c(23, 45, 32, 58)
barplot(values,
names.arg = categories,
col = c("#1f77b4", "#ff7f0e", "#2ca02c", "#d62728"),
main = "Sales by Category")
When you need to display how multiple groups break down across the same categories, passing a matrix to barplot() with beside = TRUE creates grouped bars rather than stacked bars. Each column of the matrix becomes a category on the x-axis, and each row becomes a distinct group whose bars appear side by side, making it straightforward to compare the same group across categories or different groups within a single category at a glance. The col vector assigns a distinct color to each group row, and adding a legend afterward identifies which color corresponds to which group. Grouped bars excel when the primary question is which group leads in each category, while stacked bars are better when the total across groups is the focus:
# Grouped bar chart
data_matrix <- matrix(c(10, 20, 15, 25, 12, 18), nrow = 2, ncol = 3)
barplot(data_matrix,
beside = TRUE,
names.arg = c("Q1", "Q2", "Q3"),
col = c("steelblue", "coral"))
Box plots with boxplot()
A box plot condenses an entire numeric distribution into five key summary statistics: the minimum, first quartile, median, third quartile, and maximum. The box itself spans the interquartile range (IQR), with a thick line at the median, and whiskers extend to the most extreme non-outlier values. This compact representation makes box plots ideal for comparing distributions across multiple groups side by side, as overlapping boxes, differing IQR widths, and median positions are immediately visible. The formula interface mpg ~ cyl tells boxplot() to split the mpg values by the levels of the cyl factor, producing one box per cylinder group:
# Single box plot
boxplot(mtcars$mpg,
main = "MPG Distribution")
# Box plot by group
boxplot(mpg ~ cyl, data = mtcars,
main = "MPG by Cylinder Count",
xlab = "Cylinders",
ylab = "Miles per Gallon",
col = "lightgray")
Individual points plotted outside the whiskers are flagged as potential outliers; they fall more than 1.5 times the IQR beyond either quartile. In a multi-group box plot, comparing the width and position of the boxes across cylinder counts reveals whether fuel efficiency varies by engine size, which is a common exploratory question before fitting a linear model.
Adding elements to plots
Base R plots are built incrementally: you first call a high-level function like plot(), hist(), or boxplot() to create the plotting region with axes, and then use lower-level functions to overlay additional graphical elements on top. This layered approach gives you precise control: you can add reference lines, highlight specific points, overlay a fitted model, and annotate regions of interest, all within the same coordinate system. The example below starts with an empty plot using type = "n" to set up the axes without drawing data, then systematically adds each element:
# Start with a plot
plot(x, y, type = "n") # type = "n" creates empty plot
# Add points
points(x, y, col = "steelblue", pch = 16)
# Add a regression line
abline(lm(y ~ x), col = "red", lwd = 2)
# Add a horizontal line
abline(h = mean(y), col = "gray50", lty = 2)
# Add a vertical line
abline(v = 5, col = "green", lty = 3)
# Add text
text(x = 3, y = 14, labels = "Important Point", pos = 4)
# Add a legend
legend("topleft", legend = c("Data", "Trend"),
col = c("steelblue", "red"), pch = c(16, NA), lty = c(NA, 1))
Multiple plots
When you need to display several related plots together for comparison, base R provides easy control over the panel layout. The par(mfrow = c(2, 2)) call divides the graphics device into a 2-row by 2-column grid, and each subsequent plotting command fills the panels in row-major order: left to right across the top row, then left to right across the bottom. This approach is far more efficient than generating four separate plot files, because the reader sees all views at once and can immediately compare patterns across different variable relationships. The code below combines a scatter plot, histogram, box plot, and bar chart in a single figure to give a comprehensive overview of a dataset:
# 2x2 layout
par(mfrow = c(2, 2))
plot(x, y)
hist(mtcars$mpg)
boxplot(mtcars$mpg ~ mtcars$cyl)
barplot(values, names.arg = categories)
Multi-panel layout settings persist across all subsequent plot() calls in the same R session, so every plot created after your panel figure will appear shrunken and squeezed into the last cell of the multi-panel grid unless you explicitly restore single-panel mode. This is one of the most common pitfalls in base R scripting; forgetting to reset mfrow results in mysteriously tiny plots that can waste considerable debugging time. The call below returns the graphics device to its default single-plot state so that all output that follows renders at full size with normal margins:
par(mfrow = c(1, 1))
Plot parameters with par()
The par() function sets graphical parameters that apply globally to all subsequent plots on the current device. Unlike inline arguments passed directly to plot(), parameters set via par() persist until you close the device or overwrite them, which makes par() the right tool for establishing a consistent visual style across multiple plots in the same script. The code below demonstrates the recommended pattern of saving current settings before making changes and restoring them afterward, which prevents unwanted side effects on plots created later in the session or by other functions that share the graphics device:
# Save current settings
old_par <- par(no.readonly = TRUE)
# Change settings
par(mar = c(5, 4, 2, 2), # margins (bottom, left, top, right)
mgp = c(3, 1, 0), # axis positions
cex.axis = 0.8) # axis text size
# Your plots here
# Restore
par(old_par)
Useful parameters:
- mar, margin sizes in lines
- mfrow, number of rows and columns for multiple plots
- oma, outer margin size
- bg, background color
- cex.main, cex.lab, cex.axis, text size multipliers
For safely restoring graphical state inside functions, combine par() with on.exit().
Saving plots
pdf("output.pdf", width = 7, height = 5) opens a PDF device. Draw plots, then dev.off() closes and writes the file. The same pattern works for png(), svg(), tiff(). For high-DPI PNG: png("plot.png", width = 2100, height = 1400, res = 300) gives a 7x4.67 inch plot at 300 DPI.
cairo_pdf() and cairo_ps() produce better font handling than standard pdf(). ragg::agg_png() provides the best text rendering for PNG output. Always call dev.off() to flush and close the file; forgetting to close leaves an incomplete file.
Choosing between graphics systems
Base R graphics are faster for simple plots, require no packages, and are more flexible for non-standard layouts. They are the right choice for quick exploratory plots, plots in packages where you want minimal dependencies, and specialized visualization types not supported by ggplot2.
ggplot2 is better for consistent multi-layer plots, faceting, complex legends, and polished publication graphics. Its declarative grammar makes complex visualizations more maintainable and modifications more predictable. For a team producing many reports, ggplot2’s consistency reduces cognitive overhead. When choosing between the two, consider that base R plotting has zero dependencies and starts up instantly; it remains the best choice for quick data checks during analysis and situations where package availability is uncertain.
See also
- ggplot2, the tidyverse plotting package for more complex visualizations than base R
- Interactive Plots with plotly, an alternative grid-based plotting system
- graphics, the underlying graphics package documentation