Spatial Data with sf

· 6 min read · Updated March 11, 2026 · advanced
r spatial sf gis geospatial mapping coordinates

The sf package is the modern standard for working with spatial data in R. It provides a tidy approach to geographic information systems (GIS) by representing spatial features as simple features within data frames. This guide walks you through creating, manipulating, and visualizing spatial data using sf.

What Are Simple Features?

Simple Features (SF) is a standard for representing geographic vector data. The sf package implements this standard in R, storing spatial data as a special type of data frame with a geometry list-column. Each row represents a feature (like a city, road, or country boundary), and the geometry column contains the spatial information.

The package connects to GDAL for reading and writing spatial file formats, GEOS for geometric operations on projected coordinates, and PROJ for coordinate reference system transformations. This makes sf powerful yet dependent on system libraries.

Installing and Loading sf

Install sf from CRAN along with ggplot2 for visualization:

install.packages(c("sf", "ggplot2", "dplyr"))

library(sf)
library(ggplot2)
library(dplyr)

The package should load without errors. If you see warnings about missing system libraries, you will need to install GDAL, GEOS, and PROJ on your system first.

Creating Spatial Data

The fundamental building blocks are simple feature geometries (sfc) and simple features (sf). You create them from coordinates using functions like st_point(), st_linestring(), and st_polygon().

# Create a single point
point <- st_point(c(-122.4194, 37.7749))  # San Francisco
print(point)

# Create a linestring (a road, for example)
road <- st_linestring(matrix(c(
  -122.4194, -122.4050, -122.3900,
  37.7749, 37.7700, 37.7650
), ncol = 2, byrow = TRUE))
print(road)

# Create a polygon (a bounding box, for example)
bbox <- st_polygon(list(matrix(c(
  -122.5, -122.3, -122.3, -122.5, -122.5,
  37.7, 37.7, 37.85, 37.85, 37.7
), ncol = 2, byrow = TRUE)))
print(bbox)

The output shows the geometry type and coordinates. Each geometry gets a class attribute indicating its type (POINT, LINESTRING, POLYGON).

Now wrap these geometries in an sfc object with a coordinate reference system:

# Create sfc with WGS84 (EPSG:4326)
sfc_point <- st_sfc(point, crs = 4326)
sfc_linestring <- st_sfc(road, crs = 4326)
sfc_polygon <- st_sfc(bbox, crs = 4326)

print(sfc_point)

The crs argument sets the coordinate reference system using an EPSG code. The code 4326 represents WGS84, the standard system for latitude and longitude coordinates.

Finally, create an sf object by combining sfc with a data frame:

# Create sf object with attributes
cities <- data.frame(
  name = c("San Francisco", "Oakland", "Berkeley"),
  population = c(873965, 433031, 124321)
)

sf_cities <- st_sf(
  cities,
  geometry = st_sfc(
    st_point(c(-122.4194, 37.7749)),
    st_point(c(-122.2711, 37.8044)),
    st_point(c(-122.2727, 37.8716))
  ),
  crs = 4326
)

print(sf_cities)

The result looks like a regular data frame with an extra geometry column. This is the key insight of sf: spatial data is just data with a special column.

Reading and Writing Spatial Data

Sf connects to GDAL, giving you access to dozens of spatial file formats. The main functions are st_read() and st_write().

# Read a shapefile
# nc <- st_read("path/to/counties.shp")

# Read a GeoJSON file
# nc <- st_read("counties.geojson")

# Write to GeoJSON
# st_write(sf_cities, "cities.geojson", driver = "GeoJSON")

# Write to shapefile
# st_write(sf_cities, "cities.shp", driver = "ESRI Shapefile")

The package automatically detects the driver from the file extension. Common formats include GeoJSON, ESRI Shapefile, and PostGIS databases.

Geometry Operations

The sf package provides many geometric operations through GEOS. These include buffer, union, intersection, and difference operations.

# Create a buffer around a point (1 degree radius)
buffered <- st_buffer(sfc_point, dist = 0.1)
plot(buffered)

# Find the intersection of two polygons
poly1 <- st_polygon(list(matrix(c(
  0, 0, 1, 1, 0, 0,
  0, 0, 0, 1, 1, 0
), ncol = 2, byrow = TRUE)))

poly2 <- st_polygon(list(matrix(c(
  0.5, 0.5, 1.5, 1.5, 0.5, 0.5,
  0.5, 0.5, 0.5, 1.5, 1.5, 0.5
), ncol = 2, byrow = TRUE)))

intersection <- st_intersection(
  st_sfc(poly1, crs = 4326),
  st_sfc(poly2, crs = 4326)
)
plot(intersection)

Other useful functions include st_union() to combine geometries, st_difference() to subtract one geometry from another, and st_sym_difference() for the symmetric difference.

Coordinate Reference Systems

Coordinate reference systems (CRS) define how coordinates map to locations on Earth. The sf package handles both projected (Cartesian) and geographic (spherical) coordinate systems.

# Check the CRS of an object
st_crs(sf_cities)

# Transform to a projected CRS (UTM zone 10N for California)
sf_cities_utm <- st_transform(sf_cities, crs = 32610)

# Create a new CRS from scratch
st_crs("+proj=longlat +datum=WGS84")

# EPSG codes are the easiest way to specify CRS
# 4326 = WGS84 (lat/lon)
# 32610 = UTM zone 10N
# 3857 = Web Mercator (for mapping)

Transforming to a projected CRS is necessary for accurate distance and area calculations. Geographic coordinates (like WGS84) use degrees, not meters, so buffer distances in degrees are approximate.

Working with dplyr

Because sf objects are data frames, you can use dplyr verbs to manipulate them. The geometry column is preserved through standard operations.

# Filter spatial data
sf_cities |> filter(population > 400000)

# Select columns (including geometry)
sf_cities |> select(name, population)

# Add new attributes with mutate
sf_cities |> mutate(
  size_category = ifelse(population > 500000, "large", "small")
)

# Group and summarize
# Note: geometry is dropped by default in summarise
sf_cities |> summarise(
  total_pop = sum(population),
  avg_pop = mean(population)
)

There is one gotcha: most dplyr summarise operations drop the geometry column. To keep geometry through aggregation, use st_union() or st_collect().

Visualizing with ggplot2

The ggplot2 package has built-in support for sf through geom_sf(). This makes creating maps straightforward.

# Basic point map
ggplot(sf_cities) +
  geom_sf(aes(size = population, color = population)) +
  theme_minimal()

# Map with custom fill
ggplot(sf_cities) +
  geom_sf(aes(fill = name), alpha = 0.7) +
  geom_sf_text(aes(label = name), nudge_y = 0.01) +
  theme_minimal()

The geom_sf() function automatically handles different geometry types. You can layer multiple geom_sf() calls for complex visualizations.

Common Issues and Solutions

A few issues come up frequently when working with sf:

The most common error is mixing coordinate systems. Buffer operations with geographic CRS give approximate results because degrees vary in size. Always transform to a projected CRS first for accurate measurements.

Invalid geometries cause unexpected results in spatial operations. Use st_is_valid() to check and st_make_valid() to repair them.

Large files can be slow to process. Use st_read() with the query argument for databases, or filter spatially with st_filter() before loading all data.

Wrapping Up

The sf package brings the power of GIS to R in a way that integrates naturally with the tidyverse. Spatial data becomes just another tibble with a geometry column. You can filter, transform, and visualize it using tools you already know.

For further learning, explore the stars package for raster data, the sfnetworks package for network analysis, and the leaflet package for interactive maps.

See Also