I Like It When You Call Me Large SpatialPolygonsDataFrame

Posted on March 19, 2019 | 8 minute read

Welcome to the wonderful world of R-Spatial! I really enjoy playing around with spatial data in R and for some reason find it much easier than ArcGIS (#sorrynotsorry Esri). Please keep in mind, as with everything I do in R, I am self taught and rely heavily on publicly available resources. My way of doing things may not always be the best or perfect, it is just how I was able to make sense of it.

One of the beautiful things about R is that there are many ways to get to your destination (in most cases). If you stumble across something in my code that is just wrong, could be way more efficient, and/or makes you cry into your cereal - please let me know! You definitely won’t hurt my feelings. I just want to get better!

In this post I am going to walk through some things that I have found myself doing regularly. A majority of the data I come across does not have a spatial component to it already or may not be associated with the particular geography that I need. Hopefully, I can demonstrate for you one approach to working through this in R.

The flow of this generally should be:

  1. Pull down data with coordinates
  2. Pull down polygon data
  3. Verify overlap and matching projections
  4. Combine points and polygons
  5. Map aggressively
  6. Huzzah!

Load Packages

We will start by doing the obvious and loading our packages. You may notice me references some packages this way: cowplot::plot_grid(). That just means that I opted not to load the entire package since I may only be using it one time. You may already know that but some of may not so just bear with me.

library(sf)
library(tidyverse)
library(sp)

Create Custom Theme

Here, I am quickly defining some thematic elements for my plots so I don’t have to repeat them. I really need to get better about doing this!

jason_theme <- theme(plot.title = element_text(size = 14),
        panel.background = element_blank(),
        axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank())

Download Police Incidents

I use the GeoJSON link on the Raleigh Open Data Portal to download police incidents.

police_incidents <- read_sf("https://opendata.arcgis.com/datasets/24c0b37fa9bb4e16ba8bcaa7e806c615_0.geojson")

Filter Missing Coordinates

Quickly filter records that are missing coordinates.

police_incidents <- police_incidents %>%
  filter(is.na(latitude) != TRUE)

Test Plot

Let’s just give it a try to make sure everything is working as intended. I am using base R here simply because it is faster.

plot(police_incidents$geometry)

Download Wake Census Tracts

wake_tracts <- read_sf("https://opendata.arcgis.com/datasets/21d4ff44498a4007beefdcbde6fcd2a9_0.geojson")

Test Plot

Another base R test plot just to be sure.

plot(wake_tracts$geometry)

Basic Density Plot

Great! Looks like everything is working as intended. Before I dive in to combining data, I want to show you a couple cool things you can do with these items in isolation.

Create Tibble for Density Plotting

In this first step, I am using st_coordinates() to strip the coordinates out of my police incidents simple feature object. I am then storing those coordinates as a tibble that I can pass to my plot. This is a necessary step for the density plotting that I want to do.

dens_dat <- police_incidents %>%
  st_coordinates() %>%
  as_tibble()

Density Map

I have everything I need now to do a quick map of the density of police incident points. Notice I use ..level.. to access that statistic computed by the call to stat_density_2d. You can read more about what is happening behind the scenes here. Gotta love the R community!

dens_dat %>%
  ggplot() +
  geom_sf(data = wake_tracts, color = "grey", fill = NA) +
  stat_density_2d(aes(X, Y, fill = ..level..), geom = "polygon", alpha = 0.6) +
  scale_fill_viridis_c("Density", option = "magma", direction = -1) +
  jason_theme

Hexbin Map

Maybe that doesn’t do it for you. Well strap yourself in for a hexbin map!

dens_dat %>%
  ggplot() +
  geom_sf(data = wake_tracts, color = "grey", fill = NA) +
  geom_hex(aes(X, Y), alpha = 0.6) +
  scale_fill_viridis_c("Count", option = "magma", direction = -1) +
  jason_theme

A Step Further

Those are a couple of quick maps you can generate with the data in isolation. Let’s take it a step further and see what we can do if we combine the data.

Convert to Spatial Objects

The first thing I need to do is convert both of my simple feature objects to Spatial*DataFrame.

police_spatial <- police_incidents %>%
  filter(latitude != 0) %>%
  as_Spatial()


wake_spatial <- wake_tracts %>%
  as_Spatial()

Check Projections

Now I need to verify that my projections (Coordinated Reference Systems (CRS)) are matching.

ral_crs <- proj4string(police_spatial)

wake_crs <- proj4string(wake_spatial)


print(ral_crs == wake_crs)
## [1] TRUE

Create Overlay

True means we are good to go! Now I can use over() to overlay my police incident points and Wake County census tracts. It will return the census tract attributes that are associated with each police incident record. After I perform the overlay, I am simply binding everything together.

overlay <- over(police_spatial, wake_spatial) %>%
  select(GEOID10, NAMELSAD10) %>%
  cbind(police_spatial@data) %>%
  select(OBJECTID, GEOID10, NAMELSAD10)

Join Data

Now to create a nice clean simple featur object, I join by the GEOID. It is really helpful for you to learn more about these identifiers and how they are constructed if you want to play more with Census data.

You can learn more about them here - Understanding Geographic Identifiers (GEOIDs). You will see there how the identifiers are constructed from large to small components. If you are wondering whey this might be important for you, I have lost cost of how many times I have had to piece my own identifiers together from scattered data elements.

police_incidents_final <- police_spatial %>%
  st_as_sf() %>%
  left_join(overlay)

Test Plot

Now I am going to do a quick base R test plot to make sure that everything matched the way I intended.

plot(police_incidents_final["GEOID10"])

What Can You Do With This?

Basic Relationship Plot

I now have some census data combined with my police incident records. Here, I am grouping my data by census tract and then creating a count of incidents by tract. After that, I throw together a quick exploratory plot of the relationship between Vacant Housing and count of police incidents. This is in no way explanatory - just exploratory. Please always do your due diligence before jumping to assumptions in exploratory analysis.

police_incidents_final %>%
  as_tibble() %>%
  group_by(GEOID10) %>%
  summarise(inc_count = n()) %>%
  ungroup() %>%
  select(GEOID10, inc_count) %>%
  right_join(wake_tracts) %>%
  filter(inc_count > 100) %>%
  ggplot(aes(VACANT, inc_count)) +
  geom_point() +
  geom_smooth() +
  theme_minimal() +
  labs(x = "Vacant Housing", y = "Police Incident Count", title = "Relationship Between Police Incidents and Vacant Housing")

Incident Count by Census Tract

Now for the good stuff (or at least I think so)! I want to create a choropleth map of Wake County that is colored based on the count of police incidents in each Census Tract.

police_incidents_final %>%
  as_tibble() %>%
  group_by(GEOID10) %>%
  summarise(inc_count = n()) %>%
  ungroup() %>%
  select(GEOID10, inc_count) %>%
  right_join(wake_tracts) %>%
  st_as_sf() %>%
  ggplot() +
  geom_sf(aes(fill = inc_count)) +
  scale_fill_viridis_c("Incident Count") +
  jason_theme

Assault Count by Census Tract

Now, I am going to go one level deeper and create a choropleth map of Wake County that is colored based on the count of police incidents categorized as ASSAULT.

police_incidents_final %>%
  as_tibble() %>%
  filter(crime_category == "ASSAULT") %>%
  group_by(GEOID10) %>%
  summarise(assault_count = n()) %>%
  ungroup() %>%
  select(GEOID10, assault_count) %>%
  right_join(wake_tracts) %>%
  st_as_sf() %>%
  ggplot() +
  geom_sf(aes(fill = assault_count)) +
  scale_fill_viridis_c("Assault Count") +
  jason_theme

Spatial Comparison

What if I want to pul out two features and plot them side by side? Look no further than the cowplot package!

map_1 <- police_incidents_final %>%
  as_tibble() %>%
  filter(crime_category == "ASSAULT") %>%
  group_by(GEOID10) %>%
  summarise(assault_count = n()) %>%
  ungroup() %>%
  select(GEOID10, assault_count) %>%
  right_join(wake_tracts) %>%
  st_as_sf() %>%
  ggplot() +
  geom_sf(aes(fill = assault_count)) +
  scale_fill_viridis_c("Assault Count") +
  jason_theme

map_2 <- wake_tracts %>%
  ggplot() +
  geom_sf(aes(fill = VACANT)) +
  scale_fill_viridis_c("Vacant Housing") +
  jason_theme

cowplot::plot_grid(map_1, map_2, labels = "AUTO", nrow = 1, ncol = 2)


Share via

Tags:
comments powered by Disqus