rguides

Introduction to Spatial Data in R

Spatial data represents information about locations and geographic features on Earth. Whether you are analyzing customer locations, environmental measurements, or election results, spatial data lets you uncover patterns that would be invisible in traditional tabular data. This tutorial introduces you to working with spatial data in R using modern packages.

What you’ll learn

This tutorial covers the key concepts and practical techniques for working with Introduction to Spatial Data in R. By the end, you will know how to apply the core functions in real data analysis workflows.

Why spatial data matters

Geographic context adds powerful dimensions to your analysis:

  • Location matters: Real estate prices, disease rates, and election results all vary by geography
  • Spatial relationships: Proximity to hospitals, schools, or pollution sources affects outcomes
  • Visualization: Maps communicate patterns more intuitively than tables

R has become a powerful platform for spatial analysis, with packages that integrate smoothly with the tidyverse.

Installing required packages

Two packages form the foundation of modern spatial data in R:

# Install sf for vector data (points, lines, polygons)
install.packages("sf")

# Install terra for raster data (grids, images)
install.packages("terra")

# Install tidyverse for data manipulation
install.packages("tidyverse")

# Load them
library(sf)
library(terra)
library(tidyverse)

The sf package implements Simple Features—a standardized way to represent geographic vector data. The terra package handles raster data (gridded data like satellite imagery or climate layers).

Understanding vector data

Vector data represents geographic features using points, lines, and polygons:

Geometry TypeExamplesUse Case
PointsCities, sampling locationsDiscrete locations
LinesRoads, rivers, bordersLinear features
PolygonsCountries, counties, parcelsBounded areas

The sf package stores vector data in data frames with a special geometry column:

# Create a simple point
point <- st_point(c(-122.4194, 37.7749))  # San Francisco
print(point)
# POINT (-122.4194 37.7749)

# Create a geometry list-column
sf_points <- st_sfc(list(point), crs = 4326)
sf_df <- st_as_sf(data.frame(name = "San Francisco"), geometry = sf_points)
print(sf_df)
# Simple feature collection with 1 feature and 1 field
# Geometry type: POINT
# Dimension: XY
# Bounding box: -122.42 -122.42 37.77 37.77
# Geodetic CRS: WGS 84

Creating spatial data from scratch

You can create spatial data directly in R:

# Create multiple points (cities)
cities <- tibble(
  name = c("San Francisco", "Los Angeles", "San Diego"),
  lat = c(37.7749, 34.0522, 32.7157),
  lon = c(-122.4194, -118.2437, -117.1611)
)

# Convert to sf object
cities_sf <- st_as_sf(cities, coords = c("lon", "lat"), crs = 4326)

print(cities_sf)
# Simple feature collection with 3 features and 1 field
# Geometry type: POINT
# Bounding box: -118.24 -117.16 32.72 37.77
# Geodetic CRS: WGS 84

Working with polygons

Polygons represent areas—countries, states, counties, or any bounded region:

# Create a simple rectangle (bounding box)
bbox <- st_bbox(c(xmin = -125, ymin = 24, xmax = -66, ymax = 50), crs = 4326)
rectangle <- st_as_sfc(bbox)

# Or use st_polygon to define vertices manually
polygon_coords <- list(rbind(
  c(-122, 37),
  c(-122, 38),
  c(-121, 38),
  c(-121, 37),
  c(-122, 37)
))
polygon <- st_polygon(polygon_coords)
print(polygon)
# POLYGON ((-122 37, -122 38, -121 38, -121 37, -122 37))

Reading spatial data

In practice, you will read spatial data from files. The sf package supports many formats:

# Read a shapefile (common GIS format)
# counties <- st_read("counties.shp")

# Read GeoJSON
# cities <- st_read("cities.geojson")

# Read from a URL
# url <- "https://raw.githubusercontent.com/ropensci/geojsonio/main/inst/examples/us_cities.geojson"
# cities <- st_read(url)

# For this example, load built-in NC data (North Carolina counties)
library(geosphere)
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

print(nc)
# Simple feature collection with 100 features and 14 fields
# Geometry type: MULTIPOLYGON
# Dimension: XY
# Bounding box: -84.32 33.84 -75.47 36.59
# Geodetic CRS: NAD27

Basic spatial operations

The sf package provides many spatial operations:

# Calculate area (requires projected CRS)
nc_albers <- nc_albers %>%
  mutate(area_km2 = as.numeric(st_area(geometry) / 1e6))

print(nc_albers %>% select(NAME, area_km2))
# # A tibble: 100 × 2
#    NAME         area_km2
# 1  Ashe          1155. 
# 2  Alleghany     251. 
# ...

# Calculate centroid (center point of each polygon)
centroids <- st_centroid(nc_albers)

# Buffer around points (create polygons around points)
buffer_100km <- st_buffer(nc_albers, dist = 100000)  # 100 km in meters

Spatial relationships

You can test relationships between spatial objects:

# Create a point (Raleigh, NC - the capital)
raleigh <- st_point(c(-78.6382, 35.7796)) %>%
  st_sfc(crs = 4326) %>%
  st_as_sf()

# Find which county Raleigh is in
raleigh_county <- nc %>%
  filter(st_within(raleigh, ., sparse = FALSE))

print(raleigh_county$NAME)
# "Wake"

# Calculate distance from Raleigh to each county centroid
centroids <- st_centroid(nc)
raleigh <- st_transform(raleigh, st_crs(centroids))

distances <- st_distance(raleigh, centroids)
nc <- nc %>% mutate(distance_to_raleigh_km = as.numeric(distances) / 1000)

print(nc %>% select(NAME, distance_to_raleigh_km) %>% arrange(distance_to_raleigh_km) %>% head(5))
# # A tibble: 5 × 2
#   NAME    distance_to_raleigh_km
# 1 Wake                     0.0000028
# 2 Durham                  28.4   
# 3 Orange                  39.7   
# ...

Introduction to raster data

Raster data represents continuous surfaces as grids of cells. Each cell holds a value (temperature, elevation, land cover):

# Create a simple raster from scratch
r <- rast(nrows = 10, ncols = 10, 
          xmin = 0, xmax = 10, ymin = 0, ymax = 10,
          crs = "EPSG:4326")

# Fill with values (temperature in Celsius)
values(r) <- runif(ncell(r), 10, 30)

print(r)
# class       : SpatRaster 
# dimensions : 10, 10, 1  (nrow, ncol, nlyr)
# resolution : 1, 1  (x, y)
# extent     : 0, 10, 0, 10  (xmin, xmax, ymin, ymax)
# coord. ref. : EPSG:4326 
# source     : memory 
# names      : layer 
# values     : 10.1, 29.9  (min, max)

# Raster statistics
global(r, "mean")
#         mean
# layer 20.03

# Read elevation data (example with built-in raster)
elevation <- rast(system.file("ex/elev.tif", package = "terra"))
print(elevation)

Basic spatial visualization

Create maps with ggplot2 using geom_sf():

# Map North Carolina counties
ggplot(nc) +
  geom_sf(aes(fill = AREA)) +
  scale_fill_viridis_c(name = "Area (sq. deg)") +
  labs(title = "North Carolina Counties by Area",
       subtitle = "Using sf and ggplot2") +
  theme_minimal()

You can also plot sf objects directly:

# Quick plot with sf
plot(nc["AREA"])

Combine vector and raster data:

# Crop raster to vector extent
nc_ext <- ext(nc)
elevation_nc <- crop(elevation, nc_ext)

# Plot both together
plot(elevation_nc, main = "Elevation in NC region")
plot(st_geometry(nc), add = TRUE, col = NA, border = "white")

What you have learned

This tutorial covered foundational concepts for spatial data in R:

ConceptDescription
Vector dataPoints, lines, polygons (sf package)
Raster dataGridded data (terra package)
Coordinate reference systemsHow coordinates map to locations
Spatial operationsBuffer, centroid, distance
Spatial relationshipsWithin, intersects, distance tests
Visualizationggplot2 with geom_sf()

Coordinate systems

Every spatial dataset has a coordinate reference system (CRS) that defines how coordinates correspond to locations on Earth. WGS 84 (EPSG:4326) uses longitude/latitude degrees, the system GPS devices use. Projected CRS (e.g., UTM, EPSG:32632) use meters, required for accurate distance and area calculations. st_crs(data)$epsg shows the EPSG code. st_transform(data, crs = 32632) reprojects.

Vector data with sf

The sf package represents points, lines, and polygons as a data frame with a geometry column. Each feature is a row; each attribute is a column; the geometry column holds the coordinates. Because it is a data frame, all dplyr verbs work on sf objects. st_read("file.gpkg") reads common formats. plot(sf_obj$geometry) renders the shapes quickly.

Spatial operations

st_distance(a, b) computes pairwise distances between geometries. st_buffer(pts, 1000) creates 1000-meter buffers. st_intersection(x, y) clips features to the overlap. st_union(x) dissolves all features into a single geometry. st_area(polygons) returns area in CRS units, use a projected CRS in meters for meaningful area values.

Raster data with terra

Rasters represent gridded data, satellite imagery, elevation, temperature grids. rast("file.tif") reads a GeoTIFF. Each cell has a value; the grid has a resolution and extent. Raster algebra is element-wise: ndvi <- (nir - red) / (nir + red) computes NDVI for an entire raster. extract(raster, points) retrieves raster values at point locations.

Vector vs raster data

Spatial data comes in two fundamental types. Vector data represents discrete geographic features, countries (polygons), roads (lines), cities (points), as geometric objects with attributes. Raster data represents continuous fields, elevation, temperature, satellite imagery, as a grid of cells, each with a value.

The choice between vector and raster depends on the phenomenon. Administrative boundaries, transportation networks, and point locations are inherently vector. Elevation, precipitation, and land cover are inherently raster (though they can be approximated as vector polygons). Vector formats compress well for sparse features; raster formats handle continuous variation efficiently.

R handles vector data with the sf package and raster data with terra. Both are necessary for complete spatial analysis workflows. Many real analyses combine the two: extract raster values at vector point locations (terra::extract()), rasterize vector polygons (terra::rasterize()), or mask rasters to polygon boundaries.

Coordinate reference systems

Every spatial dataset has a coordinate reference system (CRS) that defines how numbers relate to locations on Earth. Without a CRS, coordinates are meaningless. Two datasets with different CRSs cannot be compared directly, even if the numbers look similar.

Geographic CRSs use latitude and longitude in degrees, referenced to an ellipsoid model of the Earth. WGS84 (EPSG:4326) is the most common, used by GPS. Projected CRSs flatten the Earth to a flat plane with x/y coordinates in meters or feet. Projection inevitably distorts some properties (area, shape, distance, direction), different projections preserve different properties.

Use a geographic CRS for global data and data display. Use a projected CRS for distance and area calculations, st_distance() on geographic coordinates gives geodesic distance (correct but slow); on projected coordinates in meters it gives Euclidean distance (fast, approximate). For a local study area, choose a projected CRS appropriate to that region (national grid, UTM zone).

Reading and writing spatial files

sf::read_sf("file.geojson") reads GeoJSON. read_sf("file.shp") reads a shapefile (which is actually four or more files sharing a basename). read_sf("file.gpkg") reads GeoPackage. read_sf(dsn = "directory", layer = "layername") reads from a directory of files or a database.

sf::write_sf(sf_obj, "output.gpkg") writes GeoPackage. GeoPackage is the preferred modern format: it stores all layers in one SQLite file, has no column name length limit (unlike shapefiles), and supports many data types. GeoJSON is preferred for web interchange. Shapefile is the legacy format used by many tools.

terra::rast("file.tif") reads GeoTIFF rasters. terra::writeRaster(rast_obj, "output.tif") writes. GeoTIFF is the standard raster format, supporting multiple bands, compression, and embedded CRS metadata.

Spatial data sources

geodata package provides R functions to download country boundaries, elevation, climate data, and more. geodata::world(resolution = 5) downloads low-resolution world boundaries. geodata::elevation_30s("country_code") downloads 30-arc-second elevation data for a country.

rnaturalearth and rnaturalearthdata provide Natural Earth boundaries, detailed, free, public domain vector data for countries, coastlines, and physical features at multiple scales. ne_countries(scale = "medium", returnclass = "sf") returns country polygons as an sf object.

osmdata downloads OpenStreetMap data. opq(bbox = c(lon1, lat1, lon2, lat2)) %>% add_osm_feature(key = "highway") %>% osmdata_sf() returns road networks for a bounding box. OSM is the richest source of detailed geographic features globally.

Visualizing spatial data

ggplot2::geom_sf() renders sf objects in ggplot2. Stack multiple geom_sf() calls with different data for overlay maps. coord_sf(xlim, ylim) zooms to an extent. scale_fill_viridis_c() applies perceptually uniform color scales to continuous polygon attributes.

mapview::mapview(sf_obj) creates an interactive map with one function call, the fastest way to visually inspect spatial data during analysis. tmap::qtm(sf_obj) is the tmap equivalent. Both are suitable for exploratory work; switch to ggplot2 or leaflet for production visualizations.

Next steps

Continue your spatial analysis journey:

  • Working with sf and terra, Deep dive into both packages
  • Geocoding and Mapping in R — Convert addresses to coordinates and create maps
  • Spatial Joins in R — Combine spatial datasets based on location

These tutorials build on the foundation established here to open up more advanced spatial analysis techniques.

See also