rguides

Data Visualization Best Practices in R

Effective data visualization transforms complex datasets into insights that drive decisions. In R, the tidyverse ecosystem, particularly ggplot2, provides powerful tools for creating publication-quality graphics. This guide covers the principles and practices that separate good visualizations from great ones.

Choose the right chart type

The foundation of effective visualization is selecting the appropriate chart for your data and message.

Comparing categories

For comparing values across discrete categories:

  • Bar charts work best for comparing a small number of categories (5-15)
  • Lollipop charts reduce visual clutter when values are similar in magnitude
  • Heatmaps handle larger category comparisons through color intensity
library(ggplot2)

# Bar chart for category comparison
ggplot(mpg, aes(x = reorder(class, hwy, median), y = hwy)) +
  geom_bar(stat = "identity", fill = "#4C72B0") +
  coord_flip() +
  labs(x = "Car Class", y = "Highway MPG")

Showing distributions

When your goal is to convey the shape of a distribution:

  • Histograms for raw continuous data with many unique values
  • Box plots for comparing distributions across groups
  • Violin plots reveal distribution shape that box plots obscure
  • Ridgeline plots work well for comparing many distributions over time
# Violin plot with box plot overlay
ggplot(mpg, aes(x = class, y = hwy, fill = class)) +
  geom_violin(alpha = 0.7) +
  geom_boxplot(width = 0.2, fill = "white") +
  theme_minimal()

Displaying relationships

For showing relationships between variables:

  • Scatter plots for two continuous variables
  • Line charts for trends over time
  • Connected scatter plots combine both approaches

Proportions and part-to-Whole

Avoid pie charts because humans are poor at comparing angles accurately. Instead:

  • Stacked bar charts for proportions across categories
  • Waffle charts for absolute counts
  • Treemaps for hierarchical proportions

The grammar of graphics

ggplot2 implements Leland Wilkinson’s grammar of graphics, which builds visualizations from reusable components.

Core components

Every ggplot has three essential elements:

  1. Data, your tibble or data frame
  2. Aesthetic mappings, which variables map to visual properties
  3. Geoms, the geometric objects that represent the data
ggplot(data = diamonds, aes(x = carat, y = price, color = cut)) +
  geom_point(alpha = 0.5) +
  scale_y_log10()

Layering

Add complexity through layers:

  • Facets split data into subplots
  • Stats transform data before plotting
  • Scales control how aesthetics map to values
  • Themes handle non-data visual elements

Each layer adds a dimension to the plot without changing the underlying data. The example below demonstrates how adding facet_wrap() splits one dense scatter plot into a grid of subplots by diamond cut, while scale_y_log10() compresses the price range so the relationship between carat and price remains readable across several orders of magnitude.

ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point(alpha = 0.3) +
  facet_wrap(~ cut, ncol = 2) +
  scale_y_log10() +
  theme_minimal()

Color that works

Color is the most powerful and the most commonly misused aesthetic in data visualization.

Choosing a palette is not just about making the chart look nice: the wrong palette can distort data, exclude colorblind readers, or suggest a ranking where none exists. For categorical variables, the goal is to use colors that are equally distinct without implying an order.

Color for categorical data

Use distinct colors that do not imply order:

# Viridis palette - colorblind-friendly and perceptually uniform
ggplot(mpg, aes(x = class, y = hwy, fill = class)) +
  geom_bar(stat = "identity") +
  scale_fill_viridis_d()

Color for sequential data

When color represents a continuous value:

  • Use a single hue that varies in lightness or saturation
  • The viridis, scico, and rcartocolor palettes are perceptually uniform

Categorical palettes highlight differences between groups, but when your variable is a continuous measurement like density, temperature, or error rate, a sequential palette communicates the gradient more naturally. The hexbin plot below uses viridis to map point density to color, with darker regions indicating where observations cluster.

# Sequential palette for continuous data
ggplot(faithful, aes(x = eruptions, y = waiting, fill = density)) +
  geom_hex() +
  scale_fill_viridis()

Color for diverging data

When you have a meaningful midpoint (zero, average, target):

  • Use a diverging palette with distinct colors for above and below
  • The diverger palette should be balanced around your midpoint

Sequential palettes work well when values run from low to high, but many datasets have a natural center point — zero profit, average temperature, or a target threshold. A diverging palette uses two opposing hues that meet at the midpoint, making it immediately clear which observations fall above or below the reference value. The area chart below applies this principle to unemployment data, using a blue-red split around the series median.

# Diverging palette for values around zero
ggplot(economics, aes(x = date, y = uempmed, fill = uempmed)) +
  geom_area() +
  scale_fill_diverging(palette = "Blue-Red 3")

Design principles

Reduce cognitive load

Every visual element should earn its place:

  • Remove chart junk: gridlines, borders, and shading that do not encode data
  • Direct labeling beats legends when feasible
  • Eliminate 3D effects, which distort perception

Maintain proportion

Size visual elements proportionally to the data they represent:

  • Area should encode magnitude, not just diameter
  • Avoid truncated axes that exaggerate differences

Consider your audience

Adapt complexity to context:

  • Dashboards need immediate clarity, so favor simplicity
  • Technical reports can include more detail
  • Exploratory plots prioritize speed over polish

Common mistakes to avoid

Misleading axes

Truncated axes create false impressions. Always start y-axes at zero for bar charts, but line charts can start at non-zero values when the message is about change, not absolute magnitude.

Too many variables

Cluttered charts confuse rather than clarify:

  • Limit to 4-5 aesthetics maximum
  • Consider small multiples (facets) instead of layering everything
  • When in doubt, simplify

Ignoring accessibility

Approximately 8% of men and 0.5% of women have color vision deficiency:

  • Use colorblind-friendly palettes (viridis, scico)
  • Pair color with shape or pattern when possible
  • Test your visualizations with simulators

Saving your work

Export graphics at appropriate resolution and dimensions:

ggsave(
  "my-plot.pdf",
  width = 8,
  height = 6,
  units = "in",
  dpi = 300
)

# For web, use PNG or SVG
ggsave(
  "my-plot.svg",
  width = 8,
  height = 6,
  units = "in"
)

Beyond ggplot2

While ggplot2 handles most visualization needs, R has specialized tools:

  • plotly for interactive web graphics
  • leaflet for maps
  • gganimate for animations
  • patchwork for combining multiple plots
  • gt for tables that look like visualizations

Choosing the right chart type

The chart type should match the data structure and the question being answered. Bar charts compare quantities across categories. Line charts show change over time or continuous variables. Scatter plots reveal relationships between two continuous variables. Histograms show the distribution of one variable. Box plots compare distributions across groups. Heatmaps show values at intersections of two categorical variables.

Avoid pie charts for more than two categories, human perception of angles is less accurate than perception of bar lengths. Avoid 3D charts entirely: they distort proportions and add no information that a 2D chart cannot convey with better accuracy.

Typography and annotation

Text in charts serves two purposes: labeling the data and guiding interpretation. Direct labels on bars or lines are usually more readable than a legend, which requires the reader to look back and forth between the legend and the chart. ggrepel::geom_text_repel() labels points without overlap. annotate() adds specific annotations to draw attention to key features.

Font sizes matter: axis labels need to be readable at the intended output size. For a full-page figure in a paper, a font size of 8–10pt reads clearly. For a small figure in a presentation, use 12–14pt. theme(text = element_text(size = 12)) sets a global baseline in ggplot2.

Reproducible color palettes

Use colorblind-safe palettes by default: the Okabe-Ito palette and the viridis scale are designed to be distinguishable by people with the most common forms of color blindness. scale_color_viridis_d() applies viridis to discrete variables. scale_color_manual(values = c("#E69F00","#56B4E9","#009E73")) sets the Okabe-Ito palette manually.

Aspect ratio and white space

The aspect ratio of a chart affects how trends appear. A wide, short chart makes time series trends look flatter; a tall, narrow chart makes them look steeper. Cleveland’s banking to 45 degrees is the classic recommendation for line charts: adjust the aspect ratio so the average slope of lines is close to 45 degrees. In ggplot2, coord_fixed(ratio = n) or theme(aspect.ratio = h/w) control the aspect ratio.

White space is a design element, not wasted space. Cramped charts are harder to read. theme(plot.margin = margin(t = 10, r = 20, b = 10, l = 20, unit = "pt")) sets chart margins. labs(title = ...) and labs(caption = ...) add context without cluttering the data area.

For interactive exploration, plotly::ggplotly() converts any ggplot2 chart to an interactive chart with tooltips and zoom in one line.

Summary

Effective data visualization in R starts with ggplot2 and the grammar of graphics, understanding aes(), geom_*, and scale_* gives you a composable system that handles most visualization needs. Augment with extension packages as needed: patchwork for composition, ggrepel for non-overlapping labels, ggiraph for interactivity. The underlying principles, show the data, choose scales that preserve relative magnitudes, label axes clearly, do not change with the tool. Visualization quality improves most from critique and iteration, not from switching packages.

See also