Spatial Joins in R

· 5 min read · Updated March 16, 2026 · intermediate
spatial sf dplyr geospatial gis r intermediate

Spatial joins combine two datasets based on their geographic relationship rather than a common identifier. This is essential for tasks like finding all crimes within each police district, counting restaurants near each transit station, or assigning census tracts to neighborhood boundaries.

Prerequisites

Load the required packages:

install.packages(c("sf", "tidyverse", "spData"))
library(sf)
library(tidyverse)
library(spData)

Understanding Spatial Joins

Unlike traditional joins that match on equal values (like id or name), spatial joins match based on geometric relationships:

  • Intersects: geometries overlap at any point
  • Within: one geometry is completely inside another
  • Contains: one geometry completely contains another
  • Nearest: closest geometry from one layer to another

When to Use Each Predicate

PredicateUse Case
st_intersects()General overlap detection
st_within()Points completely inside polygons
st_contains()Polygons containing points
st_nearest_feature()Closest match

Your First Spatial Join

Let’s work with example data from the spData package:

# Load example datasets
data("us_states")
data("us_cities")

# Check the structure
print(head(us_states))
# Simple feature collection with 10 features and 14 fields
# Geometry type: MULTIPOLYGON
# CRS: 4326

print(head(us_cities))
# # A tibble: 6 × 10
#   NAME      pop2010    CAPITAL
# 1 Abilene     115500        0  
# 2 Akron       198549        0  
# 3 Albany      93920         1

Converting Cities to sf

Cities need coordinates to become proper sf objects:

# Convert cities to sf
cities_sf <- us_cities %>%
  st_as_sf(coords = c("lng", "lat"), crs = 4326)

print(cities_sf)
# Simple feature collection with 249 features and 9 fields
# Geometry type: POINT
# CRS: 4326

Performing the Join

Now join cities to states based on which state contains each city:

# Spatial join: which state contains each city?
cities_with_states <- us_cities %>%
  st_as_sf(coords = c("lng", "lat"), crs = 4326) %>%
  st_join(us_states, join = st_within)

print(cities_with_states %>%
        select(NAME.x, NAME.y, pop2010) %>%
        head(10))
# # A tibble: 10 × 3
#   NAME.x        NAME.y   pop2010
# 1 Abilene      Texas      115500
# 2 Akron        Ohio       198549
# 3 Albany    New York      93920

Spatial Join Types in Detail

st_intersects()

The most common join — returns all overlapping features:

# Create sample point data
points <- tibble(
  id = 1:3,
  value = c(10, 20, 30),
  geometry = st_sfc(
    st_point(c(-95, 40)),  # Kansas
    st_point(c(-85, 40)),  # Indiana  
    st_point(c(-75, 40)),  # Pennsylvania
    crs = 4326
  )
) %>% st_sf()

# Which states do these points intersect?
points %>%
  st_join(us_states, join = st_intersects) %>%
  select(id, value, NAME)
# Simple feature collection with 3 features
#   id value        NAME
# 1  1    10     Kansas
# 2  2    20      Ohio
# 3  3    30 Pennsylvania

st_within()

Only returns matches where one geometry is completely inside another:

# Create a bounding box polygon
bbox_polygon <- st_as_sfc(st_bbox(
  c(xmin = -90, xmin = -80, ymin = 35, ymax = 45),
  crs = 4326
))

# Filter cities strictly within the bounding box
us_cities %>%
  filter(pop2010 > 500000) %>%
  st_as_sf(coords = c("lng", "lat"), crs = 4326) %>%
  st_join(bbox_polygon, join = st_within) %>%
  filter(!is.na(join))

st_nearest_feature()

Finds the closest geometry regardless of overlap:

# Create sample locations
facilities <- tibble(
  name = c("Facility A", "Facility B"),
  geometry = st_sfc(
    st_point(c(-87.6, 41.9)),  # Chicago area
    st_point(c(-87.9, 41.8)),
    crs = 4326
  )
) %>% st_sf()

# Find nearest city to each facility
facilities %>%
  st_join(us_cities %>% st_as_sf(coords = c("lng", "lat"), crs = 4326),
          join = st_nearest_feature) %>%
  select(name, nearest_city = NAME)
# Simple feature collection with 2 features
#   name   nearest_city
# 1 Facility A      Chicago
# 2 Facility B      Chicago

Practical Example: Restaurant Analysis

Combine multiple datasets for meaningful analysis:

# Get New York state boundary
ny_state <- us_states %>% filter(NAME == "New York")

# Get cities in or near NY (within 1 degree buffer)
ny_cities <- us_cities %>%
  st_as_sf(coords = c("lng", "lat"), crs = 4326) %>%
  st_filter(ny_state, .predicate = st_intersects)

# Create sample restaurant locations
restaurants <- tibble(
  name = c("Joe's Pizza", "Le Bernardin", "Gramercy Tavern"),
  type = c("Pizza", "French", "American"),
  geometry = st_sfc(
    st_point(c(-74.0, 40.7)),    # Joe's Pizza
    st_point(c(-73.9, 40.8)),    # Le Bernardin
    st_point(c(-73.98, 40.74)),  # Gramercy Tavern
    crs = 4326
  )
) %>% st_sf()

# Find which city each restaurant is in
restaurants %>%
  st_join(ny_cities, join = st_nearest_feature) %>%
  select(restaurant = name, type, city = NAME.x)
# Simple feature collection with 3 features
#   restaurant         type      city
# 1    Joe's Pizza      Pizza    New York
# 2   Le Bernardin     French   New York
# 3 Gramercy Tavern   American New York

Aggregation After Spatial Joins

One powerful pattern combines spatial join with grouping:

# Count cities per state
city_counts <- us_cities %>%
  st_as_sf(coords = c("lng", "lat"), crs = 4326) %>%
  st_join(us_states, join = st_within) %>%
  group_by(NAME) %>%
  summarize(city_count = n(), 
            total_pop = sum(pop2010, na.rm = TRUE))

print(city_counts %>%
        arrange(desc(city_count)) %>%
        head(10))
# # A tibble: 10 × 3
#   NAME          city_count total_pop
# 1 California         26      32000000
# 2 Texas              19      25000000
# 3 New York           16      15000000

Handling CRS Mismatches

Always ensure consistent coordinate reference systems:

# Check CRS of both layers
st_crs(us_cities)
# Coordinate Reference System:
#   User input: EPSG:4326 
#   wkt: GEOGCRS["WGS 84", ...]

# If different, transform one layer to match
# Example: transform to projected CRS (meters) for more accurate distances
us_cities_proj <- us_cities %>%
  st_as_sf(coords = c("lng", "lat"), crs = 4326) %>%
  st_transform(32618)  # UTM zone 18N

us_states_proj <- us_states %>%
  st_transform(32618)

# Now distance calculations will be in meters

Common Pitfalls

1. Join Returns Unexpected Large Results

# Problem: points join to multiple overlapping polygons
us_cities %>%
  st_as_sf(coords = c("lng", "lat"), crs = 4326) %>%
  st_join(us_states, join = st_intersects) %>%
  nrow()
# [1] 280  # More than 249 cities!

# Solution: use st_filter or keep = FALSE
cities_in_states <- us_cities %>%
  st_as_sf(coords = c("lng", "lat"), crs = 4326) %>%
  st_filter(us_states, .predicate = st_within)

nrow(cities_in_states)
# [1] 248  # One city excluded (probably DC/MD issue)

2. Empty Results

# Problem: CRS mismatch causes empty join
# Always verify CRS before joining
st_crs(us_cities) == st_crs(us_states)
# [1] TRUE  # Good

# If FALSE, transform:
# us_cities <- st_transform(us_cities, st_crs(us_states))

Summary

You have learned how to:

  • Perform spatial joins using st_join() with different predicates
  • Choose the right join predicate for your use case
  • Aggregate data after spatial joins
  • Handle CRS mismatches between datasets
  • Avoid common spatial join pitfalls

Spatial joins are fundamental to geospatial analysis in R, enabling you to combine datasets based on geographic relationships.

See Also

  • dplyr::join — Traditional joins for combining data by key columns
  • sf::st_as_sf — Converting data frames to sf objects
  • dplyr::filter — Filtering data before spatial operations