# Install the package `sf`
# install.packages("sf")
# Load the package
library(sf)
R Workshop: Geographic and Demographic Data Analysis and Visualization
Welcome! 🗺️
Before we get started, I will have you all go ahead and request a Census API key which will allow us to access Census data for our analyses. It takes about 2-5 minutes to send, so let’s get started with this while we chat a little about spatial data 😊
Step-by-Step Instructions for Getting a Census API Key for tidycensus
The U.S. Census Bureau provides an API that allows users to access various datasets, including the American Community Survey (ACS) and the Decennial Census. To use tidycensus
in R, you’ll need to obtain and register an API key.
Request a Census API Key
Fill out the form:
- Enter your name
- Enter your email address
- Agree to the terms of service
Submit the request.
You will receive an API key via email (a long alphanumeric string).
Introduction to Geospatial Data in R
What is Geospatial Data?
Geospatial data refers to any data that has a geographic component, meaning it is tied to specific locations on the Earth’s surface. This data is used in mapping, spatial analysis, and geographic visualization.
There are two primary types of geospatial data:
1. Vector Data: Represents geographic features as points, lines, or polygons.
Examples:
- Points: Locations of schools, stores, or crime incidents
- Lines: Roads, rivers, or flight paths
- Polygons: State boundaries, land parcels, Census tracts
- Raster Data: Represents spatial data as a grid of pixels, commonly used for continuous data like elevation, temperature, and satellite imagery.
- Example: A satellite image of land cover, where each pixel represents vegetation, water, or urban areas.
Common Geospatial Data Formats
Different formats are used to store and exchange geospatial data. Some of the most common include:
- Shapefiles (.shp) – A widely used vector format consisting of multiple files that store geometry and attribute data.
- GeoJSON (.geojson) – A JSON-based format for encoding spatial data, commonly used for web mapping.
- KML (.kml) – A format developed by Google for geographic visualization in Google Earth and Maps.
- Census Data (via API or TIGER/Line Shapefiles) – Demographic and boundary data provided by the U.S. Census Bureau.
Overview of Key R Packages for Geospatial Data
R provides powerful packages for working with geospatial data. Here are the key ones we’ll use in this workshop:
sf
(Simple Features)
Provides a modern approach to handling vector geospatial data in R.
Replaces older packages like
sp
andrgdal
.Supports reading, writing, and manipulating spatial data.
tigris
Retrieves geographic boundary data from the U.S. Census Bureau (e.g., states, counties, tracts).
Works with
sf
for mapping and spatial analysis.
Example: Download state boundaries
# install.packages('tigris')
library(tigris)
# We can download boundary data for the US states
<- states(cb = TRUE,
states_sf resolution = "20m",
progress_bar = FALSE)
maps
Provides simple built-in maps for U.S. states, counties, and world boundaries.
Useful for quick visualizations.
Example: Plot a basic map of the U.S.
# Install 'maps'
# install.packages('maps')
# Load maps alone with tidyverse to access ggplot
library(tidyverse)
library(maps)
# Download U.S. map data
<- map_data("state")
us_states
# Plot the U.S. map
ggplot(us_states,
aes(long, lat,
group = group)) +
geom_polygon(fill = "lavender",
color = "black") +
theme_void()
This is great, but we’re missing Alaska, Hawaii, Puerto Rico, etc. We can use tigris
instead for the same purposes.
ggplot2
’s geom_sf
for Geospatial Visualization
geom_sf()
allows for powerful mapping of spatial data.- Enables custom styling and integration with non-spatial data.
# Let's plot this using the `sf` object from tigris
ggplot(states_sf) +
geom_sf(fill = "lavender", color = "black")
But notice how our map looks very tiny and spread out. 😱: That is because the shape file is using raw longitude and latitude. Since this plot is very hard to read, we can use a transformation to shift the non-continental U.S. territories near the rest of the U.S.:
# We'll use the `tigris` function `shift_geometry` when saving the states file
<- states(cb = TRUE,
states_sf resolution = "20m") %>%
shift_geometry()
# Now we can try plotting again:
ggplot(states_sf) +
geom_sf(fill = "lavender",
color = "black")