Data Innovation

Streamlining Broadband Insights with cori.data.fcc

Olivier Leroy

Center On Rural Innovation

2024-10-20

How are you getting your data? Are you downloading it? Is it hard?

Overview

  1. Introduction
  2. Research using the latest broadband service data
  3. Challenges to using the latest broadband service data
    i.e., More data more problems
  4. Our broadband data package: cori.data.fcc
    i.e., How we accelerate innovation by making this complex,
    ever-changing broadband data more accessible and usable for research

šŸ‘‹ Hi, Iā€™m Olivier, a Senior Data Engineer at CORI


Meet the people the people who can coax treasure out of messy, unstructured, data1

 


  • Working with source dataā€”often referred to as ā€œbigā€, ā€œmessyā€, ā€œunstructuredā€ dataā€”is a growing challenge

CORIā€™s Mission

What Broadband means to Rural America

 


We believe that small towns are home to big ideas ā€” and combining new models of economic development with strategic investments in new infrastructure can empower rural communities across the U.S. to participate in and benefit from the nationā€™s growing tech economy.

Broadband knowledge is an important part of our work

Out of necessity we had to become experts in broadband data

Weā€™ve packaged our expertise into toolsā€¦

ā€¦ and research

Broadband research, not only about housing market


The growth rate of new businesses is 213% higher for rural communities with high broadband utilization

Weinstein, A., Erouart, M., & Dewbury, A. (2024)

Broadband research is evolving
ā€¦ and so is the data!

Broadband data is more detailed than ever before but at a cost

Form 477 National Broadband Map
US Census Boundaries 2010 2020
Granularity Census blocks Locations
Timeframe 2014-2021 2022 - Ongoing
Releases twice a year twice a year
Records 416,447,807 3,488,191,994
Size 400MB/year 22GB/year

 


  • Data has grown 50 fold from one data product to the next

  • Require domain knowledge

  • Evolving data landscape

How you would use the FCCā€™s data/platformā€¦

 


 


  • šŸ” Repeat for every State (56)
  • šŸ” Repeat for every version (??)
  • šŸ”“ Error prone (500 hundred clicks)

FCC data does not meet the growing needs of researchers

 


  • I wish I didnā€™t have to manually download FCC data

  • I wish I could share my analysis (and code) with colleagues

  • I wish I could easily perform quality checks on the raw FCC data

  • Thereā€™s a steep learning curve in working with this data ā€¦ I wish I had more time to focus on interesting research and analysis

cori.data.fcc

ā€¦ data packaged as code

  1. Address Data Challenges
  • Data is packaged as code to simplify data access, reduce errors, and promote collaboration.
  • Low-level data transformations are codified: easy for others to reproduce.
  1. Accelerate Innovation
  • Broadband data is packaged so that researchers can focus on analysis and insights, not data wrangling.
  1. Unlock Deeper Insights Faster
  • cori.data.fcc provides fast access to granular details, essential for understanding broadband challenges across multiple geographic scales.

440 files in 12 lines of codes

library(cori.data.fcc)

dir <- "data_swamp/nbm/"

get_nbm_release()

nbm_data <- get_nbm_available()

system(sprintf("mkdir -p %s", dir))

dl_nbm(
  path_to_dl = "data_swamp/nbm",
  release_date = "June 30, 2023",
  data_type = "Fixed Broadband",
  data_category = "Nationwide",
)
# part to check if dl was successful
num_files <- get_nbm_available() |>
  dplyr::filter(release == "June 30, 2023" &
                data_type == "Fixed Broadband" &
                data_category == "Nationwide") |>
  nrow()

files_dl <- length(list.files(dir,
                              pattern = "*.zip"))

identical(num_files, files_dl)
# TRUE

 


  • Created quality checks to reduce errors

  • Complexity is handled in our upstream process and abstracted so that users can focus on what brings value!

  • Added DuckDB

How to use the package > Choose your own adventure!

  • Broadband data at the census block (or tract, county, etc.) level is perfect for my research:                
    Download the transformed data for NBM from CORI (ISP / County)

  • I need source data but working with hundreds of CSV is not for me:
    Download raw data as tables from CORI (NBM / Form 477)

  • I need to inspect the source (raw) data:
    Download raw data files directly from the FCC

  • The guides linked above can help you with each step!

Examples use cases

Unlock analysis with 3 repeatable lines of codes

library(cori.data.fcc); library(dplyr); library(sf); library(tigris)
library(ggplot2); library(cori.charts); library(basemapR)

cori.charts::load_fonts()

caledonia_co_nbm <- cori.data.fcc::get_nbm_bl(geoid_co = "50005")
essex_co_nbm <- cori.data.fcc::get_nbm_bl(geoid_co = "50009")
orleans_co_nbm <- cori.data.fcc::get_nbm_bl(geoid_co = "50019")

nek_nbm <- dplyr::bind_rows(caledonia_co_nbm, essex_co_nbm, orleans_co_nbm)

# tigris to get places and block
vt_blocks <- tigris::blocks("VT", progress_bar = FALSE)
vt_places <- tigris::places(state = "VT", progress_bar = FALSE)

# wrangling
nek_bb_blocks <- inner_join(
    vt_blocks,
    nek_nbm,
    by = c("GEOID20" = "geoid_bl")
  ) |>
  mutate(
    pct_100_20 = cnt_100_20 / cnt_total_locations,
    pct_fiber = cnt_fiber_locations / cnt_total_locations
  )

# Get major NEK Place centroids for map labeling
vt_places_centroids <- vt_places[lengths(sf::st_intersects(vt_places, nek_bb_blocks)) > 0, ] |>
  st_centroid()

# Map
bbox <- sf::st_bbox(nek_bb_blocks) |>
        cori.charts::fit_bbox_to_aspect_ratio(target_aspect_ratio = 2)

fig <- ggplot(data = nek_bb_blocks) +
  base_map(
    bbox,
    increase_zoom = 3,
    basemap = 'voyager'
  ) +
  geom_sf(aes(fill = pct_100_20), color = "dimgray", linewidth = 0.1, alpha = 0.9) +
  scale_fill_cori(
    discrete = FALSE,
    palette = "ctg2pu",
    labels = scales::label_percent(),
    reverse = T
  ) +
  geom_sf_label(data = vt_places_centroids, aes(label = NAME), size = 2, color = "black", family = "Lato", fontface = "bold") +
  coord_sf(
    expand = TRUE,
    xlim = c(bbox['xmin'], bbox['xmax']),
    ylim = c(bbox['ymin'], bbox['ymax'])
  ) +
  theme_cori_map() +
  theme(
    legend.key.width = unit(50, "pt")
  ) +
  labs(
    title = "Broadband service in the Northeast Kingdom",
    subtitle = "Percent of locations with access to 100/20 Mbps service by census block",
    caption = "Data source: 2023 FCC National Broadband Map\nMap source: Ā© OpenStreetMap contributors Ā© CARTO",
    x = NULL,
    y = NULL
  )

 


  • 3 lines of codes to get the dataā€¦

  • ā€¦ avoiding thousands of lines of code or clicks

  • Use census blocks: easy match with other data sources (ACS, BEA, etc ..)

Dive deeper, faster!

fig <- ggplot(data = nek_bb_blocks) +
  base_map(
    bbox,
    increase_zoom = 3,
    basemap = 'voyager'
  ) +
  geom_sf(aes(fill = pct_fiber), color = "dimgray", linewidth = 0.1, alpha = 0.6) +
  scale_fill_cori(
    discrete = FALSE,
    palette = "ctg2pu",
    labels = scales::label_percent(),
    reverse = T
  ) +
  geom_sf_label(data = vt_places_centroids, aes(label = NAME), size = 2, color = "black", family = "Lato", fontface = "bold") +
  coord_sf(
    expand = TRUE,
    xlim = c(bbox['xmin'], bbox['xmax']),
    ylim = c(bbox['ymin'], bbox['ymax'])
  ) +
  theme_cori_map() +
  theme(
    legend.key.width = unit(50, "pt"),
  ) +
  labs(
    title = "Fiber access in the Northeast Kingdom",
    subtitle = "Percent of locations with access to fiber by census block",
    caption = "Data source: 2023 FCC National Broadband Map\nMap source: Ā© OpenStreetMap contributors Ā© CARTO",
    x = NULL,
    y = NULL
  )

 


  • From a simple analysis: good on main street, bad further away

  • To a deeper analysis:

    • Only 11.5% of locations have fiber access in the Northeast Kingdom

    • 6/25 ISP are providing Fiber

    • We see that NEK broadband provides the best (most inclusive) coverage in the area. This insight relates to our recent broadband research

cori.data.fcc supports spatial and graph data at multiple scales

library(tigris);library(cori.data.fcc);library(igraph);library(dplyr)
library(crosstalk);library(DT);library(threejs)

oh <- tigris::counties(state = "39") # tigris is a great example of data as code

talk_to_me <- function(x) {
    message(sprintf("Love Ohio: %s", x))
    cori.data.fcc::get_nbm_bl(x)
}
oh_nbm <- lapply(oh$GEOID, talk_to_me) |> dplyr::bind_rows()

oh2_nbm <- oh_nbm[!is.na(oh_nbm$combo_frn), ]

od_me <- function(x) {

  temp <- oh2_nbm[x, "array_frn"][[1]]
  geoid_bl <- oh2_nbm[x, "geoid_bl"]
  if (length(temp) == 1L)
  {
    return(data.frame(V1 = temp, V2 = NA, geoid_bl = geoid_bl))
  }
  bob <- as.data.frame(t(combn(temp, 2)))
  bob$geoid_bl <- geoid_bl
  return(bob)
}

od <- lapply(1:nrow(oh2_nbm), od_me) |> dplyr::bind_rows()
od <- od[!is.na(od$V2),]

bob <- rbind(data.frame(frn = od$V1, geoid_bl = od$geoid_bl),
             data.frame(frn = od$V2, geoid_bl = od$geoid_bl))

cnt_bl <- summarise(bob, cnt_bl = n_distinct(geoid_bl), cnt_rel = n(), .by = frn)

od <- od[!is.na(od$V2),]

od$combo <- paste(od$V1, od$V2, sep = " - ")
od$count <- 1

rel <- od |> dplyr::summarize(n = sum(count), .by = combo)
give_me_from <- function(x) unlist(strsplit(x, " - "))[1]
give_me_to <- function(x) unlist(strsplit(x, " - "))[2]
rel$from <- sapply(rel$combo , give_me_from)
rel$to <- sapply(rel$combo, give_me_to)

fcc_slim <- cori.data.fcc::fcc_provider[, c("frn", "provider_name")]
frn <- data.frame( frn = unique(c(rel$from, rel$to)))
frn <- merge(frn, fcc_slim, by.x = "frn", by.y = "frn")
frn <- merge(frn, cnt_bl, by.x = "frn", by.y = "frn")

oh_graph <- graph_from_data_frame(rel[,c("from", "to")], directed = FALSE, vertices = frn)

oh_graph <- graph_from_data_frame(rel[,c("from", "to")], directed = FALSE, vertices = frn)

draw_me_a_graph <- function(x, ...) {
  threejs::graphjs(x,
          vertex.label = V(x)$provider_name,
          vertex.color = rep(2, vcount(x)),
          vertex.size = .1,
          edge.color = "grey",
          edge.width = 3, ...)
}

g <- draw_me_a_graph(oh_graph, brush=TRUE)
points3d(g, vertices(g), color="black", pch=V(oh_graph)$provider_name, size=1.5)

 


Mapping the ISP market in Ohio?

  • 1321 ISP operating in Ohio

  • How to interpret:

    • Providers competing in competitive markets are at the center

    • Providers with few competitors are on the edges

  • This type of analysis may be helpful in telling a story about ISP speeds and services, and ultimately general service quality.

How to get the package

  1. The source code is hosted on GitHub (Version control)

  2. You need the R package {remotes}.

install.packages("remotes")
remotes::install_github("ruralinnovation/cori.data.fcc")
  1. Load it!
library(cori.data.fcc)
  1. šŸš§ Check the version! šŸš§
packageVersion("cori.data.fcc")
[1] '0.1.0'

Open source supported by CORI, welcome any collaborations!


Summary

Our response to the changing broadband research landscape

 


  • We hope to reduce the entry costs associated with broadband research by making this a publically available product.

  • Weā€™ve packaged the data (and expertise) to add an extra ā€œteam memberā€ to your research team

  • And in return we hope to:

    • Support the expansion of exciting broadband research, positively impacting rural communities.

    • Increase the visibility and accountability of FCC products.

    • Increase awareness of the important role broadband plays in rural areas.


Contacts Us:


Thank you!