Spatial Data Analysis with sf and terra in R
Spatial data analysis in R has matured considerably with the sf and terra packages. sf provides a tidyverse-compatible framework for vector data (points, lines, polygons), while terra handles both raster and vector operations with a consistent, memory-efficient interface. Whether you are reprojecting coordinate systems, joining attributes by location, or computing zonal statistics on satellite imagery, these two packages handle the heavy lifting.
Prerequisites
Make sure you have the required packages installed:
install.packages(c("sf", "terra", "tidyverse", "spData"))
library(sf)
library(terra)
library(tidyverse)
Advanced vector operations with sf
Transforming coordinate reference systems
Transforming between coordinate systems is a common task. The key is understanding when to use geographic versus projected CRS:
# Load example data
library(spData)
data(us_states)
# Check current CRS
st_crs(us_states)
# Geographic: NAD83 / US (latitude and longitude)
# Transform to a projected CRS (Albers Equal Area for USA)
us_albers <- st_transform(us_states, "ESRI:102003")
# Verify transformation
st_crs(us_albers)
# Projected CRS: USA Contiguous Albers Equal Area
# Now area calculations are in square meters
us_albers <- us_albers %>%
mutate(area_km2 = as.numeric(st_area(geometry)) / 1e6)
print(us_albers %>% select(NAME, area_km2) %>% head())
# # A tibble: 6 × 2
# NAME area_km2
# 1 Alabama 127095.
# 2 Alaska ...
Buffer operations
Buffers create polygons around geometries at a specified distance. They are useful for defining service areas (e.g., all locations within 10 km of a store), creating exclusion zones, or measuring proximity. Because buffer distance is expressed in the units of the CRS, you must first transform your data to a projected coordinate system — geographic coordinates (degrees) produce meaningless buffers.
# Create points for cities
cities <- tibble(
name = c("Denver", "Phoenix", "Salt Lake City"),
lat = c(39.7392, 33.4484, 40.7608),
lon = c(-104.9903, -112.0740, -111.8910)
) %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326)
# Create buffers in meters (requires projected CRS)
cities_albers <- st_transform(cities, "ESRI:102003")
# Buffer: 100 km, 200 km, and 300 km
buffers_100 <- st_buffer(cities_albers, dist = 100000)
buffers_200 <- st_buffer(cities_albers, dist = 200000)
buffers_300 <- st_buffer(cities_albers, dist = 300000)
# Visualize
plot(st_geometry(buffers_300), col = "lightblue", border = NA)
plot(st_geometry(buffers_200), col = "lightgreen", border = NA, add = TRUE)
plot(st_geometry(buffers_100), col = "lightyellow", border = NA, add = TRUE)
plot(st_geometry(cities_albers), pch = 19, col = "red", add = TRUE)
Geometric operations
The sf package provides many geometric operations:
# Union: combine polygons into one
library(spData)
data(us_states)
us_west <- us_states %>% filter(REGION == "West")
west_union <- st_union(us_west)
plot(west_union, main = "Western United States")
# Intersection: keep only overlapping areas
# Create a bounding box for Colorado region
bbox <- st_bbox(c(xmin = -115, ymin = 32, xmax = -100, ymax = 45), crs = 4326)
bbox_sfc <- st_as_sfc(bbox)
# Intersect with western states
west_crop <- st_intersection(us_west, bbox_sfc)
plot(west_crop$geometry, main = "Western States cropped to region")
# Difference: subtract one geometry from another
denver <- cities_albers %>% filter(name == "Denver")
denver_buffer <- st_buffer(denver, dist = 200000)
us_albers <- st_transform(us_states, "ESRI:102003")
colorado <- us_albers %>% filter(NAME == "Colorado")
outside_denver <- st_difference(denver_buffer, st_union(colorado))
plot(outside_denver, main = "200km around Denver, excluding Colorado")
Working with geometry collections
Some operations return geometry collections that need special handling:
# Create lines
line1 <- st_linestring(rbind(c(0, 0), c(1, 1), c(2, 0)))
line2 <- st_linestring(rbind(c(0, 1), c(1, 2), c(2, 1)))
# Union lines
lines <- st_sfc(line1, line2, crs = 4326)
combined <- st_line_merge(st_cast(st_union(lines), "LINESTRING"))
print(combined)
# LINESTRING (0 0, 1 1, 2 0, 2 1, 1 2, 0 1)
# Cast between geometry types
points <- st_sfc(st_point(c(0, 0)), st_point(c(1, 1)), crs = 4326)
multipoint <- st_cast(points, "MULTIPOINT")
# Polygons from points (convex hull)
hull <- st_convex_hull(st_union(points))
Advanced raster operations with terra
Creating and manipulating rasters
The terra package provides efficient raster operations:
# Create a multi-layer raster
r1 <- rast(nrows = 10, ncols = 10,
xmin = 0, xmax = 10, ymin = 0, ymax = 10,
crs = "EPSG:4326")
r2 <- rast(nrows = 10, ncols = 10,
xmin = 0, xmax = 10, ymin = 0, ymax = 10,
crs = "EPSG:4326")
# Fill with values
values(r1) <- runif(ncell(r1), 0, 100)
values(r2) <- runif(ncell(r2), 0, 50)
# Stack rasters
r_stack <- c(r1, r2)
names(r_stack) <- c("temperature", "precipitation")
print(r_stack)
# class : SpatRaster
# dimensions : 10, 10, 2
Raster algebra
Raster algebra lets you perform cell-by-cell calculations across one or more rasters. Arithmetic operations like addition and multiplication are vectorized across every cell in the grid. Conditional operations produce binary rasters (TRUE/FALSE) that are useful for thresholding — for example, identifying all cells where temperature exceeds 50 degrees.
# Arithmetic operations
r_combined <- r1 + r2
r_scaled <- r1 * 2
# Conditional operations
r_hot <- r1 > 50 # TRUE/FALSE raster
# Focal operations (moving window)
r_smooth <- focal(r1, w = 3, fun = mean)
# Apply function to each cell
r_log <- app(r1, function(x) log(x + 1))
Cropping and masking
Cropping and masking let you extract the portion of a raster that falls within a specific extent or polygon boundary. Cropping trims the raster to the bounding box of your area of interest, while masking sets cells outside the polygon to NA — preserving the original raster dimensions but blanking out irrelevant areas. These two operations are frequently chained: crop first to reduce memory usage, then mask to isolate the study region.
# Load example raster
elevation <- rast(system.file("ex/elev.tif", package = "terra"))
# Create a spatial extent
ext <- ext(700000, 710000, 4800000, 4810000)
# Crop to extent
elev_crop <- crop(elevation, ext)
# Or use an sf object for cropping
library(spData)
data(us_states)
colorado <- us_states %>% filter(NAME == "Colorado")
colorado_sf <- st_geometry(colorado) %>% st_transform(crs(elevation))
# Crop to Colorado
elev_colorado <- crop(elevation, colorado_sf)
# Mask: set values to NA outside polygon
elev_masked <- mask(elev_colorado, colorado_sf)
# Inverse mask: keep only values inside polygon
elev_inside <- mask(elev_colorado, colorado_sf, inverse = TRUE)
Aggregating and disaggregating
Change raster resolution:
# Aggregate: combine cells (increase cell size, decrease resolution)
elev_agg <- aggregate(elevation, fact = 2, fun = mean)
# Disaggregate: split cells (decrease cell size, increase resolution)
elev_disagg <- disaggregate(elevation, fact = 2)
# Resample: change to match another raster
target <- rast(nrows = 50, ncols = 50,
ext(elevation), crs = crs(elevation))
elev_resampled <- resample(elevation, target, method = "bilinear")
Zonal statistics
Calculate statistics for zones defined by polygons:
# Load data
elevation <- rast(system.file("ex/elev.tif", package = "terra"))
library(spData)
data(us_states)
# Get only Alabama
alabama <- us_states %>% filter(NAME == "Alabama") %>%
st_geometry() %>%
st_transform(crs(elevation))
# Crop elevation to Alabama
elev_al <- crop(elevation, alabama)
# Calculate zonal statistics
# Note: terra uses SpatVector, so we convert
al_vect <- as(alabama, "SpatVector")
zonal_stats <- zonal(elev_al, al_vect, fun = c("mean", "min", "max", "sd"))
print(zonal_stats)
# layer mean min max sd
# 1 1 267.419 10.1667 572.67 118.0869
Combining sf and terra
The real power comes from combining vector and raster operations:
# Extract raster values at points
library(spData)
data(us_states)
# Get centroid points for states
centroids <- st_centroid(us_states) %>%
st_transform(4326)
# Sample raster values at these points
elevation <- rast(system.file("ex/elev.tif", package = "terra"))
coords <- st_coordinates(centroids)
elevation_values <- extract(elevation, coords[, c("X", "Y")])
# Add to dataframe
us_states <- us_states %>%
mutate(elevation = elevation_values$elevation)
# Plot
ggplot(us_states) +
geom_sf(aes(fill = elevation)) +
scale_fill_viridis_c(name = "Elevation (m)") +
labs(title = "State Centroids Elevation") +
theme_minimal()
# Rasterize vector data
# Create a template raster
template <- rast(nrows = 100, ncols = 100,
ext(us_states), crs = st_crs(us_states)$proj4string)
# Rasterize state areas
us_raster <- rasterize(st_geometry(us_states), template,
field = as.numeric(st_area(us_states)) / 1e6,
fun = "first")
plot(us_raster, main = "State Areas (km²)")
Performance tips
Working with large spatial datasets requires optimization:
# Use spatial indices for faster operations
# sf uses spatial indexing automatically in most operations
# For very large data, consider:
# 1. Only load needed columns
us_states_small <- us_states %>% select(NAME, geometry)
# 2. Use spatial indexing explicitly
idx <- st_make_index(us_states, "bbox")
# 3. Parallel processing for large operations
library(future)
plan(multisession)
# 4. Write intermediate results
# st_write(us_states, "us_states.gpkg")
What you have learned
This tutorial covered advanced spatial operations:
| Topic | Package | Description |
|---|---|---|
| CRS transformation | sf | Transform between geographic and projected CRS |
| Buffers | sf | Create polygons around geometries |
| Geometric operations | sf | Union, intersection, difference |
| Raster algebra | terra | Perform calculations on rasters |
| Crop and mask | terra | Extract portions of raster data |
| Zonal statistics | terra | Calculate stats by zone |
| Extract values | terra + sf | Get raster values at points |
| Rasterize | terra + sf | Convert vectors to rasters |
Combining vector and raster
The most common workflow combining sf and terra is extracting raster values at point or polygon locations. terra::extract(raster, vect(sf_points)) converts an sf object to terra’s SpatVector format and extracts values. For zonal statistics (mean raster value within each polygon), terra::zonal(raster, zones, fun = "mean") computes statistics per zone. The exactextractr package provides faster and more accurate extraction for polygon zones by handling partial cell coverage.
Coordinate reference systems
CRS management is the most common source of errors in spatial analysis. Every spatial dataset has a CRS, a definition of how coordinates map to locations on Earth. WGS 84 (EPSG:4326) uses longitude/latitude in degrees; Web Mercator (EPSG:3857) is used by web maps; local projected CRS (e.g., UTM zones) use meters. st_crs(layer) shows the current CRS. For distance and area calculations, always reproject to a projected CRS first, angular coordinates (degrees) give incorrect distances.
Reading and writing spatial data
st_read("file.gpkg") reads GeoPackage, Shapefile, GeoJSON, and dozens of other formats. st_write(sf_obj, "output.gpkg") writes them. GeoPackage is the modern replacement for Shapefile, it stores all components in a single file and supports long column names. For web map data, GeoJSON is standard. rast("file.tif") reads GeoTIFF rasters; writeRaster(r, "output.tif") writes them with optional compression.
The sf package
sf (simple features) represents vector spatial data, points, lines, polygons, as a data frame where one column holds the geometry. Every row is a feature; non-geometry columns are attributes. This design makes spatial data manipulable with all standard dplyr verbs: filter(), mutate(), left_join() all work on sf objects, and the geometry column travels along.
st_read("file.shp") reads ESRI shapefiles, GeoJSON, GeoPackage, and any format supported by GDAL. st_write(sf_obj, "output.gpkg") writes to GeoPackage (the preferred modern format over shapefile because it stores everything in one file with no 10-character column name limit). read_sf() and write_sf() are tidyverse-style aliases that return tibbles and silently overwrite files.
The geometry column type reflects the feature type: POINT, MULTIPOINT, LINESTRING, MULTILINESTRING, POLYGON, MULTIPOLYGON, or GEOMETRYCOLLECTION. st_geometry_type(x) returns the type of each feature. Mixed-type collections use GEOMETRYCOLLECTION but are harder to work with, cast to a single type with st_cast(x, "POLYGON").
The terra package
terra handles raster data and also provides vector operations. rast() creates or reads rasters. vect() reads vector files into SpatVector objects, terra’s vector class. The two classes do not mix directly, use sf::st_as_sf() to convert SpatVector to sf, and terra::vect() to convert sf to SpatVector.
Multi-layer rasters (stacks) replace the old raster::stack() and raster::brick() workflow. c(r1, r2) combines SpatRaster objects into a multi-layer stack. terra::subset(r, 2:4) extracts layers 2 through 4. Layer names are set with names(r) <- c("red", "green", "blue").
terra is significantly faster than the older raster package and handles rasters larger than RAM through out-of-memory support using disk-backed operations. For very large raster analysis, terra reads only the portions needed for each operation.
Coordinate reference systems in practice
Both sf and terra store CRS as WKT2 (Well-Known Text version 2) strings. EPSG codes are the standard shorthand: st_crs(4326) returns the WGS84 CRS. crs(r) on a terra raster returns the CRS string.
Checking CRS compatibility before spatial operations: st_crs(layer1) == st_crs(layer2) for sf; both layers must match for sf operations like st_join() and st_intersects(). For terra, compareGeom(r1, r2) checks CRS, extent, and resolution compatibility.
Reprojection: st_transform(sf_obj, crs = 3857) for vector; terra::project(rast_obj, "EPSG:3857") for raster. Always project to a metric CRS before computing distances and areas in meters.
Spatial predicates and operations
Binary spatial predicates return logical matrices: st_intersects(x, y) returns a sparse matrix where entry [i, j] is TRUE if feature i of x intersects feature j of y. lengths(st_intersects(x, y)) > 0 gives a logical vector indicating which features in x intersect anything in y, an efficient spatial filter.
Unary geometric operations modify features: st_centroid(x) returns the centroid of each feature. st_buffer(x, dist = 1000) creates a buffer of 1000 CRS units around each feature. st_union(x) merges all features into a single geometry; st_union(x, by_feature = FALSE) dissolves all overlapping polygons.
Set operations: st_intersection(x, y) clips x to the intersection with y. st_difference(x, y) removes the overlap. st_union(x, y) combines both layers. These are the vector equivalents of raster masking and overlay operations.
Two complementary frameworks
sf (simple features) and terra address different aspects of spatial data in R. sf focuses on vector data, points, lines, and polygons that represent discrete features with attributes. terra focuses on raster data, gridded continuous surfaces. The two frameworks are designed to work together: you can convert between sf geometries and terra rasters, use each where appropriate, and combine results in the same analysis.
The choice between sf and terra for vector operations is effectively settled: sf implements the Simple Features standard, which is used by every major GIS system, and its objects integrate directly with dplyr for attribute operations. For raster operations, terra replaced the older raster package with a faster, more memory-efficient implementation. New analyses should use sf for vector data and terra for raster data.
Performance characteristics
sf operations on vector data are generally fast because the underlying GEOS library (spatial predicates, intersections, buffers) is implemented in optimized C++. The bottleneck for large vector datasets is usually the attribute join or spatial join, not the geometry operations. Using sf with dplyr produces readable code; for operations on millions of features, the overhead of dplyr’s grouped operations can be significant and may warrant more direct sf function calls.
terra raster operations are optimized for large rasters that do not fit in memory. It processes rasters in chunks, allowing computation on rasters larger than available RAM. Operations that would require loading a full raster in the older raster package can complete with terra on memory-constrained systems. For small rasters that fit in memory, the performance is comparable.
Coordinate reference system practices
All spatial operations require that input datasets are in the same CRS. sf’s operations that combine two datasets — spatial joins, intersections, within() tests — raise errors if the CRS does not match, which prevents silently incorrect results. terra’s operations similarly require matching CRS. Always transform data to a common CRS at the start of an analysis using st_transform (sf) or project (terra).
The choice of CRS affects measurement accuracy. Geographic CRS (latitude/longitude) stores coordinates as degrees; distance and area calculations on degree-based coordinates are approximate unless using geodesic calculations. Projected CRS stores coordinates in meters or feet on a flat plane; distance and area calculations are straightforward Euclidean geometry. For analyses that measure distances or areas, project to an appropriate equal-area or equidistant CRS for the geographic region of interest.
Next steps
Now that you understand working with sf and terra, explore these related topics to deepen your knowledge and apply these techniques in more complex scenarios.
See also
- Introduction to Spatial Data in R — foundational concepts for spatial work
- Spatial Joins in R — joining data by location with sf
- Raster Analysis in R — deeper dive into terra raster operations