EDA on 0/0 BSL

Published

July 25, 2024

TODO: update the SQL query

Code
source("R/table_with_options.R")
# very lazish function, col should be a string 
agg_count <- function(dat, col) {
    agg <- aggregate(cbind(count = dat$count),
                     list(name_col = dat[[col]]),
                      sum)
    colnames(agg) <- c(col, "count")
    return(agg)
}

The goals of this page is storing a quick EDA about broadband services locations with 0 MBps uploads and 0 MBps downloads. To be concise we are going to call them 0/0 speeds.

We have counted every services that have been declared with 0/0 speeds and associated with their States, ISP and technology. To clarify that does not mean a location have 0/0 speeds only but that one “ISP x technology” is provided with this kind of service in this location.

The data used to provide most of the analysis was done with this 2 SQL queries. They were saved and stored in data/

Code
SELECT 
    state_abbr,
    brand_name,
    technology,
    count(brand_name)
FROM
    staging.june23
WHERE
(max_advertised_download_speed = 0 AND
    max_advertised_upload_speed = 0) = true
GROUP BY brand_name, state_abbr, technology;

-- first get all 0/0 then get all the non 0/0

SELECT 
    state_abbr,
    brand_name,
    technology,
    count(brand_name)
FROM 
    staging.june23
WHERE
(max_advertised_download_speed = 0 AND
    max_advertised_upload_speed = 0) = false
GROUP BY brand_name, state_abbr, technology;
Code
zero_loc <- read.csv("data/zero_dl_up.csv")
not_zero <- read.csv("data/not_zero_dl.csv")

Summary by technologies:

Code
agg <- agg_count(zero_loc, "technology") 
agg_not <- agg_count(not_zero, "technology")

technology <- merge(agg, agg_not, by.x = "technology", 
                    by.y = "technology", all.x = TRUE, all.y = TRUE) 
colnames(technology) <- c("technology",  "cnt_zero_dl", "cnt_non_zero")
technology$rate_zero <- round(technology$cnt_zero_dl / 
                (technology$cnt_zero_dl +  technology$cnt_non_zero), 4)

table_with_options(technology)


We do not mind too much 70 (Unlicensed Terrestrial Fixed Wireless) because we are filtering it out but we are keeping 71 (Licensed Terrestrial Fixed Wireless) , 72 (Licensed-by-Rule Terrestrial Fixed Wireless)and 10 (Copper Wire).

To take that into account I will filter out Unlicensed Terrestrial Fixed Wireless for the rest of this document. I also filtered out 60 and 61 to be consistant with our pipelines.

Summary by ISP

Code
filter_sat <- c(60, 61, 70)
zero_loc <- zero_loc[which(! zero_loc$technology %in% filter_sat), ]
not_zero <- not_zero[which(! not_zero$technology %in% filter_sat), ]

agg <- agg_count(zero_loc, "brand_name")
agg_not <- agg_count(not_zero, "brand_name")

rate_zero <- merge(agg, agg_not, 
                   by.x = "brand_name", by.y = "brand_name"
                   , all.x = TRUE) 

colnames(rate_zero) <- c("brand_name", "cnt_zero_dl", "cnt_non_zero")
rate_zero$rate_zero <- round(rate_zero$cnt_zero_dl /
                 (rate_zero$cnt_zero_dl +  rate_zero$cnt_non_zero),
                            4)

table_with_options(rate_zero[
                    order(rate_zero$cnt_zero_dl, decreasing = TRUE),])

402 ISPs are declaring services with 0/0 MBips (We have 2902 ISPs registered in FCC NBM)

Sumamry by States

Code
st_agg_zero <- agg_count(zero_loc, "state_abbr")
st_agg_not <- agg_count(not_zero, "state_abbr")

st_agg <- merge(st_agg_zero, st_agg_not, 
                by.x = "state_abbr", by.y = "state_abbr",
                all.x = TRUE, all.y = TRUE)
colnames(st_agg) <- c("ST", "cnt_zero_dl", "cnt_non_zero")
st_agg$rate_zero <- round(st_agg$cnt_zero_dl /
                    (st_agg$cnt_zero_dl +  st_agg$cnt_non_zero),
                            4) 
library(ggplot2)

ggplot(st_agg[!is.na(st_agg$rate_zero),], aes(rate_zero)) +
  geom_boxplot(orientation = "y",
  fill='#A4A4A4', color="black") +
  coord_flip() +
  theme_bw()

Code
table_with_options(st_agg[order(st_agg$rate_zero, decreasing = TRUE), ])


One point of concern is that services with 0/0 speeds could be generated for various reasons. One could be that some technology offer very low downloads/uploads and that is rounding to 0 an other could be that the location is not actually deserved but the ISP think it can do it.