R Workshop: Geographic and Demographic Data Analysis and Visualization

Author

Michelle Bueno Vásquez

Published

April 28, 2025

Welcome! 🗺️

Setting up a Census API Key

Before we get started, I will have you all go ahead and request a Census API key which will allow us to access Census data for our analyses. It takes about 2-5 minutes to send, so let’s get started with this while we chat a little about spatial data 😊

Step-by-Step Instructions for Getting a Census API Key for tidycensus

The U.S. Census Bureau provides an API that allows users to access various datasets, including the American Community Survey (ACS) and the Decennial Census. To use tidycensus in R, you’ll need to obtain and register an API key.

Request a Census API Key

  1. Go to the U.S. Census API Key Request Page:

  2. Fill out the form:

    • Enter your name
    • Enter your email address
    • Agree to the terms of service
  3. Submit the request.
    You will receive an API key via email (a long alphanumeric string).


Introduction to Geospatial Data in R

What is Geospatial Data?

Geospatial data refers to any data that has a geographic component, meaning it is tied to specific locations on the Earth’s surface. This data is used in mapping, spatial analysis, and geographic visualization.

There are two primary types of geospatial data:
1. Vector Data: Represents geographic features as points, lines, or polygons.
Examples:
- Points: Locations of schools, stores, or crime incidents
- Lines: Roads, rivers, or flight paths
- Polygons: State boundaries, land parcels, Census tracts

  1. Raster Data: Represents spatial data as a grid of pixels, commonly used for continuous data like elevation, temperature, and satellite imagery.
    • Example: A satellite image of land cover, where each pixel represents vegetation, water, or urban areas.

Common Geospatial Data Formats

Different formats are used to store and exchange geospatial data. Some of the most common include:

  • Shapefiles (.shp) – A widely used vector format consisting of multiple files that store geometry and attribute data.
  • GeoJSON (.geojson) – A JSON-based format for encoding spatial data, commonly used for web mapping.
  • KML (.kml) – A format developed by Google for geographic visualization in Google Earth and Maps.
  • Census Data (via API or TIGER/Line Shapefiles) – Demographic and boundary data provided by the U.S. Census Bureau.

Overview of Key R Packages for Geospatial Data

R provides powerful packages for working with geospatial data. Here are the key ones we’ll use in this workshop:

sf (Simple Features)

  • Provides a modern approach to handling vector geospatial data in R.

  • Replaces older packages like sp and rgdal.

  • Supports reading, writing, and manipulating spatial data.

# Install the package `sf`
# install.packages("sf")

# Load the package
library(sf)

tigris

  • Retrieves geographic boundary data from the U.S. Census Bureau (e.g., states, counties, tracts).

  • Works with sf for mapping and spatial analysis.

Example: Download state boundaries

# install.packages('tigris')
library(tigris)

# We can download boundary data for the US states
states_sf <- states(cb = TRUE,
                    resolution = "20m",
                    progress_bar = FALSE)

maps

  • Provides simple built-in maps for U.S. states, counties, and world boundaries.

  • Useful for quick visualizations.

Example: Plot a basic map of the U.S.

# Install 'maps'
# install.packages('maps')

# Load maps alone with tidyverse to access ggplot
library(tidyverse)
library(maps)

# Download U.S. map data
us_states <- map_data("state")

# Plot the U.S. map
ggplot(us_states, 
       aes(long, lat, 
           group = group)) + 
  geom_polygon(fill = "lavender", 
               color = "black") +
  theme_void()

This is great, but we’re missing Alaska, Hawaii, Puerto Rico, etc. We can use tigris instead for the same purposes.

ggplot2’s geom_sf for Geospatial Visualization

  • geom_sf() allows for powerful mapping of spatial data.
  • Enables custom styling and integration with non-spatial data.
# Let's plot this using the `sf` object from tigris
ggplot(states_sf) +
    geom_sf(fill = "lavender", color = "black")

But notice how our map looks very tiny and spread out. 😱: That is because the shape file is using raw longitude and latitude. Since this plot is very hard to read, we can use a transformation to shift the non-continental U.S. territories near the rest of the U.S.:

# We'll use the `tigris` function `shift_geometry` when saving the states file
states_sf <- states(cb = TRUE,
                    resolution = "20m") %>% 
  shift_geometry()

# Now we can try plotting again:
ggplot(states_sf) +
    geom_sf(fill = "lavender", 
            color = "black")