# Install the package `sf`
# install.packages("sf")
# Load the package
library(sf)
R Workshop: Geographic and Demographic Data Analysis and Visualization
Welcome! 🗺️
Before we get started, I will have you all go ahead and request a Census API key which will allow us to access Census data for our analyses. It takes about 2-5 minutes to send, so let’s get started with this while we chat a little about spatial data 😊
Step-by-Step Instructions for Getting a Census API Key for tidycensus
The U.S. Census Bureau provides an API that allows users to access various datasets, including the American Community Survey (ACS) and the Decennial Census. To use tidycensus
in R, you’ll need to obtain and register an API key.
Request a Census API Key
Fill out the form:
- Enter your name
- Enter your email address
- Agree to the terms of service
Submit the request.
You will receive an API key via email (a long alphanumeric string).
Introduction to Geospatial Data in R
What is Geospatial Data?
Geospatial data refers to any data that has a geographic component, meaning it is tied to specific locations on the Earth’s surface. This data is used in mapping, spatial analysis, and geographic visualization.
There are two primary types of geospatial data:
1. Vector Data: Represents geographic features as points, lines, or polygons.
Examples:
- Points: Locations of schools, stores, or crime incidents
- Lines: Roads, rivers, or flight paths
- Polygons: State boundaries, land parcels, Census tracts
- Raster Data: Represents spatial data as a grid of pixels, commonly used for continuous data like elevation, temperature, and satellite imagery.
- Example: A satellite image of land cover, where each pixel represents vegetation, water, or urban areas.
Common Geospatial Data Formats
Different formats are used to store and exchange geospatial data. Some of the most common include:
- Shapefiles (.shp) – A widely used vector format consisting of multiple files that store geometry and attribute data.
- GeoJSON (.geojson) – A JSON-based format for encoding spatial data, commonly used for web mapping.
- KML (.kml) – A format developed by Google for geographic visualization in Google Earth and Maps.
- Census Data (via API or TIGER/Line Shapefiles) – Demographic and boundary data provided by the U.S. Census Bureau.
Overview of Key R Packages for Geospatial Data
R provides powerful packages for working with geospatial data. Here are the key ones we’ll use in this workshop:
sf
(Simple Features)
Provides a modern approach to handling vector geospatial data in R.
Replaces older packages like
sp
andrgdal
.Supports reading, writing, and manipulating spatial data.
tigris
Retrieves geographic boundary data from the U.S. Census Bureau (e.g., states, counties, tracts).
Works with
sf
for mapping and spatial analysis.
Example: Download state boundaries
# install.packages('tigris')
library(tigris)
# We can download boundary data for the US states
<- states(cb = TRUE,
states_sf resolution = "20m",
progress_bar = FALSE)
maps
Provides simple built-in maps for U.S. states, counties, and world boundaries.
Useful for quick visualizations.
Example: Plot a basic map of the U.S.
# Install 'maps'
# install.packages('maps')
# Load maps alone with tidyverse to access ggplot
library(tidyverse)
library(maps)
# Download U.S. map data
<- map_data("state")
us_states
# Plot the U.S. map
ggplot(us_states,
aes(long, lat,
group = group)) +
geom_polygon(fill = "lavender",
color = "black") +
theme_void()
This is great, but we’re missing Alaska, Hawaii, Puerto Rico, etc. We can use tigris
instead for the same purposes.
ggplot2
’s geom_sf
for Geospatial Visualization
geom_sf()
allows for powerful mapping of spatial data.- Enables custom styling and integration with non-spatial data.
# Let's plot this using the `sf` object from tigris
ggplot(states_sf) +
geom_sf(fill = "lavender", color = "black")
But notice how our map looks very tiny and spread out. 😱: That is because the shape file is using raw longitude and latitude. Since this plot is very hard to read, we can use a transformation to shift the non-continental U.S. territories near the rest of the U.S.:
# We'll use the `tigris` function `shift_geometry` when saving the states file
<- states(cb = TRUE,
states_sf resolution = "20m") %>%
shift_geometry()
# Now we can try plotting again:
ggplot(states_sf) +
geom_sf(fill = "lavender",
color = "black")
I like to add theme_void
to get rid of those coordinate lines:
# We'll use the `tigris` function `shift_geometry` when saving the states file
<- states(cb = TRUE,
states_sf resolution = "20m") %>%
shift_geometry()
# Now we can try plotting again:
ggplot(states_sf) +
geom_sf(fill = "lavender",
color = "black") +
theme_void()
Summary
- Geospatial data is any data with a geographic component, stored in vector (points, lines, polygons) or raster (grids) formats.
- Common formats include Shapefiles, GeoJSON, KML, and Census data.
- Key R packages:
sf
for spatial data manipulation.tidycensus
for retrieving Census data.tigris
for downloading geographic boundaries.maps
for built-in map data.ggplot2
for visualization.
Exercise: plot your home state and city
I’ll do the same for Illinois and Chicago
# get the data for IL counties
# get data
<- map_data("county", "illinois") %>%
il_counties select(lon = long, lat, group, id = subregion)
# Chicago and Evanston lon and lat (looked it up on Google)
# Chicago: 41.8781° N, -87.6298° W
# Evanston: 42.0521° N, -87.6848° W
<- tibble(
il_city_data city_name = c("Chicago", "Evanston"),
lon = c(-87.6298, -87.6848),
lat = c(41.8781, 42.0521)
)
# Make map with cities
ggplot(il_counties, aes(lon, lat)) +
geom_polygon(aes(group = group),
fill = "slateblue",
col = 'grey50') +
::geom_star(data = il_city_data,
ggstarsize = 4,
fill = "gold",
mapping = aes(fill = city_name)) +
geom_label(data = il_city_data,
aes(label = city_name),
nudge_x = c(-.5, -.8),
nudge_y = c(-.3, 0),
size = 3.2) +
coord_quickmap() +
theme_void() +
ggtitle("Illinois: the Lincoln State")
YOUR TURN 🙌 :
Using my code above as a guide, plot your home state/any state of your choosing and its capital and/or your home town.
Using tidycensus for Census Data Retrieval and Visualization
What is tidycensus?
The tidycensus package is an R interface to the U.S. Census Bureau API. It is useful for mapping, analyzing, and visualizing demographic data in R.
It allows users to retrieve demographic and socioeconomic data from:
- The American Community Survey (ACS) (yearly estimates)
- The Decennial Census (10-year population counts)
- Other Census datasets
Here are a few of the commands you can use to access Census Bureau products:
get_acs()
, which requests data from the 1-year and 5-year American Community Survey samples. Data are available from the 1-year ACS back to 2005 and the 5-year ACS back to 2005-2009.get_estimates()
, an interface to the Population Estimates APIs. These datasets include yearly estimates of population characteristics by state, county, and metropolitan area, along with components of change demographic estimates like births, deaths, and migration rates.get_flows()
, an interface to the ACS Migration Flows APIs. Includes information on in- and out-flows from various geographies for the 5-year ACS samples, enabling origin-destination analyses.get_pums()
, which accesses data from the ACS Public Use Microdata Sample APIs. These samples include anonymized individual-level records from the ACS organized by household and are highly useful for many different social science analyses.1
Install and Load tidycensus
in R
You should have received a Census API key to the email you entered earlier.
# Install the `tidycensus`
# install.packages("tidycensus")
# Load the package
library(tidycensus)
### Register API key
## Option 1: temporary, save it only for this session
## (delete # in front of the following line when ready to run code)
# census_api_key("YOUR_API_KEY_HERE")
# Option 2 (my recommendation): Save the API Key for Future Use
## to avoid entering the key every time, store it permanently:
## (delete # in front of the following line when ready to run code)
# census_api_key("YOUR_API_KEY_HERE", install = TRUE)
Replace "YOUR_API_KEY_HERE"
with the key you received.
Option 2 will save the key to your .Renviron
file, so it loads automatically in future R sessions. Restart R for the change to take effect.
Getting ACS Data
The ACS provides yearly estimates on income, education, race, and other demographics. Use get_acs()
to retrieve ACS data.
- Example: Get median household income by state for 2024
Let’s get some ACS data and test the API Key
To verify that the key is working, try fetching some ACS (American Community Survey) data.
First, let’s load the variables in the ACS for the year we’d like to gather data from. We use the call load_variables
We can also find them listed on the Census website
# Load variable list
<- load_variables(2023, "acs5", cache = TRUE)
vars23
# We can take a look and search for variables of interest
view(vars23)
But as you can see, it can be overwhelming to search. Here is a trick to quicken our search:
# Let's filter all variables with 'median household income' in the label column
<- vars23 %>%
income_vars filter(grepl('median household income', label, ignore.case = TRUE))
view(income_vars)
The variable B19013_001
with label Estimate!!Median household income in the past 12 months (in 2023 inflation-adjusted dollars)
and concept Median Household Income in the Past 12 Months (in 2023 Inflation-Adjusted Dollars)
looks perfect for our needs.
Now that we know our variable of interest, let’s define our ACS call by state for 2023:
# Using the get_acs function, we can get Census data for specific geogprahies and specific variables
<- get_acs(geography = "state",
df variables = "B19013_001", # Median Household Income
year = 2023,
progress_bar = FALSE)
# Let's take a sneak peak of our data frame
head(df)
# A tibble: 6 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 01 Alabama B19013_001 62027 400
2 02 Alaska B19013_001 89336 1374
3 04 Arizona B19013_001 76872 414
4 05 Arkansas B19013_001 58773 503
5 06 California B19013_001 96334 298
6 08 Colorado B19013_001 92470 483
If the data loads without errors, your API key is successfully set up!
The dataset includes:
- GEOID (geographic ID)
- NAME (state name)
- estimate (median income value)
- moe (margin of error)
Combining Census Data with Geospatial Data
We can skip a step by requesting geographic data simultaneously with Census data, using the argument geometry = TRUE
<- get_acs(geography = "state",
income_sf variables = c(median_income = "B19013_001"),
year = 2023,
geometry = TRUE,
progress_bar = FALSE) %>% # Get spatial data
shift_geometry() # Don't forget or else we get a weird map!
Now we can create different types of visualizations
# Install viridis color package
# install.packages('viridis')
library(viridis)
# Plot
ggplot(income_sf) +
geom_sf(aes(fill = estimate),
color = "white") +
scale_fill_viridis_c(option = "magma",
name = "Median Income") +
theme_void() +
labs(title = "Median Household Income by State (2023)",
caption = "Source: U.S. Census Bureau ACS 2023\n*note: Puerto Rico not included")
Key Features of this Plot:
aes(fill = estimate)
colors states by incomescale_fill_viridis_c()
improves color contrasttheme_minimal()
gives a clean look
Okay, the map looks good but that scaling on the legend is kind of ugly.
Let’s fix that:
# Install and load the 'scales' library
# install.packages('scales')
library(scales)
# Plot with pretty numbers
options(scipen = 999) # this will prevent the plots from producing a default scientific notation
ggplot(income_sf) +
geom_sf(aes(fill = estimate),
color = "white") +
scale_fill_viridis_c(option = "magma",
name = "Median Income",
labels = scales::comma) + # this will add a comma for thousands
theme_void() +
labs(title = "Median Household Income by State (2023)",
caption = "Source: U.S. Census Bureau ACS 2023\n*note: Puerto Rico not included")
Top states by income
<- income_sf %>%
top_income arrange(desc(estimate)) %>%
slice(1:10)
ggplot(top_income, aes(x = reorder(NAME, estimate), y = estimate)) +
geom_col(fill = "steelblue") +
coord_flip() + # Flip for readability
theme_minimal() +
scale_y_continuous(labels = scales::comma) + # same scale trick, but note we use `scale_y_continuous` since this time we are targeting the x-axis
labs(title = "Top 10 States by Median Income (2023)",
x = "State", y = "Median Income",
caption = "Source: U.S. Census Bureau ACS 2023")
We can also use scatter plots along with maps to display population
For this, gathering population per county and mapping it onto states may be best
# Get county-level population data
<- get_acs(
county_pop geography = "county",
variables = "B01003_001", # Total population variable
year = 2023,
geometry = TRUE,
progress_bar = FALSE
%>%
) shift_geometry()
# Extract centroid coordinates for plotting as points
<- county_pop %>%
county_pop mutate(
lon = st_coordinates(st_centroid(geometry))[,1], # Longitude
lat = st_coordinates(st_centroid(geometry))[,2] # Latitude
) # filter(estimate >= 50000) # only keep counties with more than 50k people
# Check data structure
head(county_pop)
Simple feature collection with 6 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 139720.3 ymin: -780463.1 xmax: 1044933 ymax: -368685.2
Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
GEOID NAME variable estimate moe
1 01003 Baldwin County, Alabama B01003_001 239945 NA
2 01069 Houston County, Alabama B01003_001 107628 NA
3 01005 Barbour County, Alabama B01003_001 24757 NA
4 01119 Sumter County, Alabama B01003_001 12020 NA
5 05091 Miller County, Arkansas B01003_001 42588 NA
6 05133 Sevier County, Arkansas B01003_001 15797 NA
geometry lon lat
1 MULTIPOLYGON (((765298 -780... 789795.1 -722243.2
2 MULTIPOLYGON (((975268 -651... 1014495.8 -652117.0
3 MULTIPOLYGON (((966818.1 -6... 997016.3 -573843.3
4 MULTIPOLYGON (((709323.3 -5... 727423.5 -519270.6
5 MULTIPOLYGON (((180424 -440... 194967.8 -466481.6
6 MULTIPOLYGON (((139720.4 -3... 161278.0 -390636.1
Aggregate Population Data to State Level
<- county_pop %>%
state_pop group_by(GEOID = substr(GEOID, 1, 2)) %>% # Use first 2 digits of GEOID (State FIPS)
summarise(total_population = sum(estimate, na.rm = TRUE))
# Load state geometries
<- get_acs(
states_sf geography = "state",
variables = "B01003_001", # Total population variable
year = 2023,
geometry = TRUE,
progress_bar = FALSE
%>%
) shift_geometry()
Now we plot:
ggplot() +
# Draw state boundaries
geom_sf(data = states_sf, fill = "gray90", color = "white", size = 0.3) +
# Overlay county population as proportional dots using geom_point()
geom_point(data = county_pop, aes(x = lon, y = lat, size = estimate),
color = "coral", alpha = 0.2) +
# Scale dot sizes to be visually proportional
scale_size_continuous(
range = c(1, 10), # Adjust min/max dot sizes
labels = comma, # Format population labels with commas
name = "County Population"
+
)
# Titles and theme
theme_void() +
labs(
title = "County Population Distribution in the U.S.",
subtitle = "Larger circles represent higher county populations",
caption = "Source: U.S. Census Bureau ACS 2022"
)
We can also adjust color to the population size:
# Plot
ggplot() +
# Draw state boundaries
geom_sf(data = states_sf,
fill = "gray70",
color = "white",
size = 0.3) +
# Overlay county population as proportional dots, with adjusted transparency
geom_point(data = county_pop,
aes(x = lon,
y = lat,
size = estimate,
color = estimate,
alpha = estimate)) +
# Adjust alpha (transparency): Larger values = more opaque
scale_alpha_continuous(range = c(0.4, 1),
guide = "none") + # Avoid cluttering the legend
# Unified legend for size and color (ordered ascending)
scale_size_area(
max_size = 10, # Adjust max dot size
labels = comma, # Format population labels with commas
name = "County Population",
guide = guide_legend(order = 1) # Ensure size legend comes first
+
) scale_color_viridis_c(
#option = "mako", # Color scheme (magma = best for ordered data)
labels = comma, # Format labels
direction = -1, # reverse so darker = larger population
name = "County Population",
guide = guide_colorbar(order = 2) # Ensure color legend comes second
+
)
# Titles and theme
theme_void() +
labs(
title = "County Population Distribution (50K+ Population)",
subtitle = "Larger and darker circles represent more populous counties (more opaque)",
caption = "Source: U.S. Census Bureau ACS 2022"
)
YOUR TURN 🙌 :
Choose another variable of interest and gather Census ACS data & plot it using the above code as a guide
Common ACS Variables
Here are a few frequently used variables:
Variable Name | Description |
---|---|
B19013_001 | Median Household Income |
B17001_002 | Population Below Poverty Level |
B25077_001 | Median Home Value |
B01003_001 | Total Population |
B03002_003 | Non-Hispanic White Population |
B03002_004 | Black Population |
B03002_012 | Hispanic or Latino Population |
Here are other examples of useful variables:
- Demographic Analysis (Population by Race, Ethnicity, Age, Gender)
Variable Name | Description |
---|---|
B01003_001 | Total Population |
B01002_001 | Median Age |
B01001_002 | Male Population |
B01001_026 | Female Population |
B03002_003 | Non-Hispanic White Population |
B03002_004 | Black or African American Population |
B03002_012 | Hispanic or Latino Population |
B03002_006 | Asian Population |
- Socioeconomic Status (SES) & Income Inequality
Variable Name | Description |
---|---|
B19013_001 | Median Household Income |
B19001_002 – B19001_017 | Household Income Brackets |
B17001_002 | Population Below Poverty Level |
B19301_001 | Per Capita Income |
B25077_001 | Median Home Value |
B25064_001 | Median Gross Rent |
- Education & Employment
Variable Name | Description |
---|---|
B15003_022 | Bachelor’s Degree or Higher |
B15003_017 | High School Graduates |
B23025_002 | Civilian Labor Force |
B23025_005 | Unemployment Count |
B23025_007 | Unemployment Rate |
- Housing & Urban Development
Variable Name | Description |
---|---|
B25077_001 | Median Home Value |
B25064_001 | Median Gross Rent |
B25034_001 | Housing Units by Year Built |
B25002_003 | Vacant Housing Units |
B25024_002 | Single-Family Homes |
B25024_010 | Mobile Homes |
- Health & Disability
Variable Name | Description |
---|---|
B18101_001 | Population with a Disability |
B27010_001 | Population with Health Insurance Coverage |
B27010_017 | Population Uninsured |
B25099_001 | Households with No Plumbing Facilities |
Using state bins
This is another approach for data visualization of state data where we ditch the state’s natural shape to emphasize comparisons among them.
# Install and load 'statebins'
# install.packages('statebins')
library(statebins)
# Using the income dataframe from earlier, let's add income levels
<- mutate(
US_income
income_sf,income_bins = cut(
ifelse(is.na(estimate), 25000, estimate),
breaks = c(0, 40000, 50000, 60000, 70000, 80000, 150000),
labels = c("< $40k", "$40k to $50k",
"$50k to $60k", "$60k to $70k", "$70k - $80k", " > $80k"),
right = FALSE
)
)
# Plotting
%>%
US_income ggplot() +
::geom_statebins(aes(geometry = geometry,
statebinsstate = NAME,
fill = income_bins)) +
scale_fill_viridis_d() +
theme_statebins() +
labs(title = "Median Household Income by State (2023)",
caption = "Source: U.S. Census Bureau ACS 2023. Note: Puerto Rico not included")
Advanced example: using multiple variables at once
# Let's gather data for income and poverty for states in 2023
<- get_acs(geography = "state",
data variables = c(income = "B19013_001",
poverty = "B17001_002",
population = "B01003_001"),
year = 2023,
geometry = TRUE,
progress_bar = FALSE) %>%
shift_geometry()
head(data)
Simple feature collection with 6 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -1233579 ymin: -614973.5 xmax: -35340.83 ymax: 970427.9
Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
GEOID NAME variable estimate moe geometry
1 35 New Mexico population 2114768 NA MULTIPOLYGON (((-1231344 -5...
2 35 New Mexico poverty 375381 7817 MULTIPOLYGON (((-1231344 -5...
3 35 New Mexico income 62125 608 MULTIPOLYGON (((-1231344 -5...
4 46 South Dakota population 899194 NA MULTIPOLYGON (((-633765.6 8...
5 46 South Dakota poverty 104511 3107 MULTIPOLYGON (((-633765.6 8...
6 46 South Dakota income 72421 805 MULTIPOLYGON (((-633765.6 8...
Notice how our variables are all listed under the variable column. For our analyses, it would be fruitful to have the variables as their own columns.
To do this, we will pivot the data long-ways so that each variable is its own column.
# Transpose the data so each variable has its own column}
<- data %>%
data_wide select(GEOID,
state = NAME,
variable,
estimate, %>% # Keep relevant columns
geometry) distinct(GEOID,
state,
variable,
estimate, .keep_all = TRUE) %>% # Remove duplicate rows
pivot_wider(
names_from = variable,
values_from = estimate
%>%
) mutate(poverty_rate = (poverty / population) * 100) # Compute Poverty Rate
# Let's look at our new data set:
head(data_wide)
Simple feature collection with 6 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -2356114 ymin: -782290.7 xmax: 1419556 ymax: 970427.9
Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
# A tibble: 6 × 7
GEOID state geometry population poverty income poverty_rate
<chr> <chr> <MULTIPOLYGON [m]> <dbl> <dbl> <dbl> <dbl>
1 35 New Me… (((-1231344 -588235.8, -… 2114768 375381 62125 17.8
2 46 South … (((-633765.6 865552.6, -… 899194 104511 72421 11.6
3 06 Califo… (((-2066923 -203083.1, -… 39242785 4610600 96334 11.7
4 21 Kentuc… (((584560 -88603.17, 585… 4510725 707480 62417 15.7
5 01 Alabama (((760323.7 -749311.5, 7… 5054253 768185 62027 15.2
6 13 Georgia (((1390722 -584139.2, 13… 10822590 1423159 74664 13.1
# Install package`RColorBrewer`
# install.packages('RColorBrewer')
library(RColorBrewer) # For color scales
# Define bivariate color scale using ColorBrewer
<- c(
bivariate_palette "Low-Low" = brewer.pal(9, "PuBuGn")[1], # Light green
"Low-Medium" = brewer.pal(9, "PuBuGn")[3],
"Low-High" = brewer.pal(9, "PuBuGn")[5],
"Medium-Low" = brewer.pal(9, "Purples")[3],
"Medium-Medium" = brewer.pal(9, "Purples")[5],
"Medium-High" = brewer.pal(9, "Purples")[7],
"High-Low" = brewer.pal(9, "Oranges")[3],
"High-Medium" = brewer.pal(9, "Oranges")[5],
"High-High" = brewer.pal(9, "Oranges")[7] # Dark orange
)
# Assign colors to the dataset
<- data_wide %>%
data_wide mutate(
income_cat = cut(income,
breaks = quantile(income,
probs = seq(0, 1, length.out = 4),
na.rm = TRUE),
labels = c("Low", "Medium", "High"),
include.lowest = TRUE),
poverty_cat = cut(poverty_rate,
breaks = quantile(poverty_rate,
probs = seq(0, 1, length.out = 4),
na.rm = TRUE),
labels = c("Low", "Medium", "High"),
include.lowest = TRUE),
category = paste(income_cat,
poverty_cat, sep = "-") # Create bivariate categories
)
How This Works
Each state (or region) falls into one of nine possible categories based on:
- Income level: Low, Medium, or High
- Poverty rate: Low, Medium, or High
Each category is assigned a unique color using three different ColorBrewer palettes:
- PuBuGn (Blue-Green shades) → Used for Low Income
- Purples (Purple shades) → Used for Medium Income
- Oranges (Orange shades) → Used for High Income
Darker colors indicate higher poverty rates, making the map intuitive:
- “Low-Low” (Low income, Low poverty) → Lightest Green
- “Low-High” (Low income, High poverty) → Darker Green
- “High-Low” (High income, Low poverty) → Light Orange
- “High-High” (High income, High poverty) → Darkest Orange
Breakdown of Color Assignments
I chose the colors using the link for the Brewer Color Palettes.
Category (Income-Poverty) | Color Palette Used | Shade |
---|---|---|
Low-Low (Low income, Low poverty) | PuBuGn (Blue-Green) | Lightest Green ([1]) |
Low-Medium (Low income, Medium poverty) | PuBuGn | Medium Green ([3]) |
Low-High (Low income, High poverty) | PuBuGn | Darker Green ([5]) |
Medium-Low (Medium income, Low poverty) | Purples | Light Purple ([3]) |
Medium-Medium (Medium income, Medium poverty) | Purples | Medium Purple ([5]) |
Medium-High (Medium income, High poverty) | Purples Darker Purple ([7]) | |
High-Low (High income, Low poverty) | Oranges | Light Orange ([3]) |
High-Medium (High income, Medium poverty) | Oranges | Medium Orange ([5]) |
High-High (High income, High poverty) | Oranges | Darkest Orange ([7]) |
# Create plot
ggplot(data_wide) +
geom_sf(aes(fill = category),
color = "white",
size = 0.2) +
scale_fill_manual(values = bivariate_palette,
name = "Income-Poverty Scale") + # Add legend
theme_void() +
labs(
title = "Income & Poverty Rate by U.S. State (2023)",
subtitle = "Bivariate Choropleth Map",
caption = "Source: U.S. Census Bureau ACS 2023"
)
Using Census data with other data sources
Census data can be used alongside other political data sources (such as ANES, election data, or even your own survey or qualitative data)!
Let’s use the income-poverty data and compare it to red vs blue states.
First let’s make a data frame with the states and their respective 2024 vote
# Using data from Politico on 2024 Presidential election
#red_states ~ "#fb5c44"
#blue_states ~ "#336fba"
<- c("Idaho" = "#fb5c44",
state_colors "Montana" = "#fb5c44",
"North Dakota" = "#fb5c44",
"South Dakota" = "#fb5c44",
"Wyoming" = "#fb5c44",
"Utah" = "#fb5c44",
"Nebraska" = "#fb5c44",
"Kansas" = "#fb5c44",
"Oklahoma" = "#fb5c44",
"Texas" = "#fb5c44",
"Louisiana" = "#fb5c44",
"Mississippi" = "#fb5c44",
"Alabama" = "#fb5c44",
"Iowa" = "#fb5c44",
"Missouri" = "#fb5c44",
"Arkansas" = "#fb5c44",
"Tennessee" = "#fb5c44",
"Kentucky" = "#fb5c44",
"Indiana" = "#fb5c44",
"Ohio" = "#fb5c44",
"Nevada" = "#fb5c44",
"Arizona" = "#fb5c44",
"Pennsylvania" = "#fb5c44",
"Georgia" = "#fb5c44",
"Michigan" = "#fb5c44",
"Wisconsin" = "#fb5c44",
"West Virginia" = "#fb5c44",
"North Carolina" = "#fb5c44",
"South Carolina" = "#fb5c44",
"Florida" = "#fb5c44",
"Alaska" = "#fb5c44",
"Washington" = "#336fba",
"Oregon" = "#336fba",
"California" = "#336fba",
"Colorado" = "#336fba",
"New Mexico" = "#336fba",
"Minnesota" = "#336fba",
"Massachusetts" = "#336fba",
"Illinois" = "#336fba",
"Virginia" = "#336fba",
"Maryland" = "#336fba",
"District Of Columbia" = "#336fba",
"Connecticut" = "#336fba",
"Rhode Island" = "#336fba",
"Delaware" = "#336fba",
"Maine" = "#336fba",
"New York" = "#336fba",
"New Jersey" = "#336fba",
"Hawaii" = "#336fba",
"New Hampshire" = "#336fba",
"Vermont" = "#336fba"
)
<- as.data.frame(state_colors)
color_vote
<- tibble::rownames_to_column(color_vote, "state") color_vote
Next, let’s modify the data set to sort by top poverty rate and income, respectively, and make two different plots
# Let's plot poverty rate and income
<-
poverty_colors # let's join the data sets while also sorting by top poverty rate
%>%
data_wide inner_join(color_vote, by = "state") %>%
filter(state != "Puerto Rico") %>%
mutate_if(is.numeric, round, 2) %>%
top_n(10, poverty_rate) %>%
# Now we can plot
ggplot(aes(x = reorder(state, poverty_rate),
y = poverty_rate,
fill = state_colors)) +
geom_col() +
scale_fill_manual(
name = "2024 Presidential Election Result",
labels = c("Blue", "Red"),
values = c("#336fba","#fb5c44"),
guide = "none"
+
) coord_flip() +
geom_text(aes(label = poverty_rate),
color = "white",
hjust = 1.1,
size = 3) +
theme_bw() +
labs(
y = "Poverty Rate",
title = 'Top 10 states by poverty rate\n& 2024 Presidential vote'
+
) theme(
axis.title.y = element_blank(),
axis.ticks.x = element_blank())
<-
income_colors # let's join the data sets while also sorting by top income
%>%
data_wide inner_join(color_vote, by = "state") %>%
filter(state != "Puerto Rico") %>%
mutate_if(is.numeric, round, 2) %>%
top_n(10, income) %>%
# Now we can plot
ggplot(aes(x = reorder(state, income),
y = income,
fill = state_colors)) +
geom_col() +
scale_fill_manual(
name = "2024 Presidential Election Result",
labels = c("Blue", "Red"),
values = c("#336fba","#fb5c44"),
guide = "none"
+
) scale_y_continuous(labels = scales::comma) +
coord_flip() +
geom_text(aes(label = scales::comma(income)),
color = "white",
hjust = 1.1,
size = 3) +
theme_bw() +
labs(
y = "Median Household Income",
title = 'Top 10 states by income\n& 2024 Presidential vote',
caption = "\nU.S. Census Bureau 2023 ACS 5-Year Population Estimates \n Politico 2024 Election Mapping."
+
) theme(
axis.title.y = element_blank(),
axis.ticks.x = element_blank())
Finally, let’s view these plots side by side and see if there are interesting conclusions we can make about the interactions between poverty/income and 2024 vote choice.
# Install patchwork to view these side by side
# install.packages('patchwork')
library(patchwork)
+ income_colors poverty_colors
YOUR TURN 🙌 : Using the color_vote data from the 2024 election, find another variable you are interested in to visualize along with vote choice.
Footnotes
Source: Analyzing US Census Data: Methods, Maps, and Models in R by Kyle Walker↩︎