Working with SQL in R

Suggested Code Snippets

TOC


Connect to Postgres

Follow instructions on the cori_db wiki to set up your .Renviron file.

con <- connect_to_db('sch_layer')

The cori_db wiki also documents several helper functions for DB interaction in the coriverse.

Read SQL files

query <- cori.db::read_sql('path/to/sql.file')

# read_sql() is vectorized, so you can read multiple files into a vector of queries

queries <- cori.db::read_sql(c('path/to/file1.sql', 'path/to/file2.sql'))

Execute a query on Postgres (no return value)

execute_on_postgres(query)

# execute_on_postgres() is vectorized, so you can execute multiple queries sequentially

execute_on_postgres(queries)

Read data from a query

con <- connect_to_db('schema')
dta <- DBI::dbGetQuery(con, "select * from table limit 5")
DBI::dbDisconnect(con) # important !


Read Data

CSV

  • Use readr::read_csv() for files with < 100K rows. For larger files, use data.table::fread()

Google Sheet


library(googlesheets4)

options(httr_oob_default = TRUE)
gs4_auth()

read_sheet("URL")

.SHP, .GPKG, or other spatial file

  • sf::st_read()

ESRI Server

remotes::install_github('yonghah/esri2sf')

library(esri2sf)

esri2sf(url)

Airtable

remotes::install_github('matthewjrogers/rairtable')

set_airtable_api_key('MY_KEY_HERE')

atbl <- airtable('Table Name', "BASE_ID")

dta <- read_airtable(atbl)

Write Data

NON-SPATIAL

To CSV

  • Use readr::write_csv() for data sets of <100K rows, data.table::fwrite() for larger data sets

To Postgres

DBI::dbWriteTable(con, "name", df)

Google Sheets

library(googlesheets4)

options(httr_oob_default = TRUE)
gs4_auth()

# make a new workbook
gs4_create("name", sheets = list(sheet_name = df, ...))

# existing workbook

write_sheet(df, ss = "workbook_url", sheet = "worksheet_name_to_write_to)

To Airtable


remotes::install_github('matthewjrogers/rairtable')

set_airtable_api_key('MY_KEY_HERE')

# requires that table already exists with correct column names and types
insert_records()

# same requirements as insert, plus you need data with airtable record ids
update_records()

SPATIAL

To Postgres

sf::st_write(df, con, "name")

To File

sf::st_write(df, "path/name.gpkg")

CARTO

library(coriverse)

write_carto(sf_df, 'name_of_layer_on_carto', Rto::carto())

Disconnect from a database

library(DBI)

con = cori.db::connect_to_db("sch_layer")

# Analysis code here

DBI::dbDisconnect(con)

Validate a left join

# check that the result of the join has the same number of rows as the left hand side of the join
# If the numbers are identical, the join was performed on a unique key or set of keys

new_df = dplyr::left_join(old_df1, old_df2, by = 'key_var')
stopifnot(nrow(old_df1) == nrow(new_df))

Explore a dataset

  • Use dplyr::glimpse() to get an overview of large data sets (many columns)
  • head() will work well for data with fewer columns
  • For numeric variables of interest, summary() provides a good overview
  • For categorical variables, use group_by() and count() from dplyr to get counts by group
  • Get count of NA values across all columns with lapply(df, function(c) sum(is.na(c)))

Visualize data on a map

Compare 2 dataframes

(Useful when updating data and you want to see what changed)

library(arsenal)
# st_drop_geometry() only needed for spatial data frames
summary(comparedf(old_table %>% st_drop_geometry(), new_table %>% st_drop_geometry()), by = "id")