Working with Primary data

cori.data.fcc:: an example of data as code

Olivier Leroy

Center On Rural Innovation

2024-10-20

PLAN

CORI

  • Me
  • CORI

Me

CORI

Center On Rural Innovation (CORI)

  • one image (Doritos?)

  • why Data , open source and open data is important at CORI

FCC’s Data

Our needs

Connect humanity project

Bead tools

Previous workflow:

  • What FCC data look like in the wild
  • Our previous ingestion process
  • Limits of it

Manual v.s. …

  • 🔁 Repeat for every state
  • 🔁 Repeat for every version
  • 🔴 Error prone
num_files <- get_nbm_available() |>
  dplyr::filter(release == "June 30, 2023" &
                data_type == "Fixed Broadband" &
                data_category == "Nationwide") |>
  nrow()

files_dl <- length(list.files(dir,
                              pattern = "*.zip"))

identical(num_files, files_dl)
# TRUE

… Code Automated …

library(cori.data.fcc)

dir <- "data_swamp/nbm/"

get_nbm_release()

nbm_data <- get_nbm_available()

system(sprintf("mkdir -p %s", dir))

dl_nbm(
  path_to_dl = "data_swamp/nbm",
  release_date = "June 30, 2023",
  data_type = "Fixed Broadband",
  data_category = "Nationwide",
)
num_files <- get_nbm_available() |>
  dplyr::filter(release == "June 30, 2023" &
                data_type == "Fixed Broadband" &
                data_category == "Nationwide") |>
  nrow()

files_dl <- length(list.files(dir,
                              pattern = "*.zip"))

identical(num_files, files_dl)
# TRUE

… Code Packaged

cori.data.fcc

New technologies

  • DuckDB and S3
  • Parquet and interoperability

Packaging code and data

internally developed packages are […] extra expert team members.1

tl:dr R package / pkgdown / GH

  • ?get_nbm_raw

  • articles

  • vignettes

Examples of uses cases

Pot. ideas

  • Block covered BEAD like

  • ISP networks

  • Just download data

Download files

  • see above

Use CORI curated files

Custom

  • Isp over time?
  • States level analysis?

References

TODO:

McBain (2024, March 11). Before I Sleep: Patterns and anti-patterns of data analysis reuse. Retrieved from https://milesmcbain.com/posts/data-analysis-reuse/

https://www.emilyriederer.com/post/team-of-packages/#collaboration

Contacts

Website: https://ruralinnovation.us

LinkedIn | Twitter | Facebook | Instagram | YouTube