Friday, March 23rd, 2012
The Federal Communications Commission (FCC) releases data on broadband subscribership rates at the city and county level in broad ranges — like 20 percent to 40 percent — rather than exact percentages. By using a statistical method called “Monte Carlo simulation,” the Workshop was able to estimate rates for entire cities (and other groups of census tracts) by estimating the sum of all contained tracts (and calculating an uncertainty of each estimate). The accuracy of this method was verified by estimating subscribership rates for entire states and comparing these estimates to values published by the FCC.
The FCC doesn't release numeric subscribership estimates for geographies smaller than states. The commission does, however, provide county and census tract-specific data that assigns a category to each tract. Specifically, a "0" means there are no broadband connections, a "1" means 0-20 percent of households have broadband, a "2" means 20-40 percent of households have broadband, a "3" means 40-60 percent of households have broadband, a "4" means 60-80 percent of households have broadband, a "5" means 80-plus percent of households have broadband.
The FCC collects information from broadband providers directly via Form 477. The broadband definition used for this report was faster than 96KB/s (768 Kbps) download and 25 KB/s (200 Kbps) upload. That’s the speed identified in the economic stimulus legislation, which funded broadband projects, passed in 2009. While this rate would seem grossly inadequate for most broadband users today, it is the fastest the FCC releases at the census-tract level. The FCC definition — 200 Kbps upload or download — is even slower.
In order to obtain estimates for state and metropolitan statistical areas, we ran a “Monte Carlo simulation” that predicts the exact value of the number of housing units and the number of broadband subscribers in each tract. [For more about this general topic, see the Wikipedia article]. Demographic data for this project comes from the American Community Survey 2005-2009, which is based on a one in six sample, so the number of households is not precisely known.
Precise household unit counts are simulated on the assumption that the true number of units is normally distributed within the 90 percent margin of error given by the ACS. Tract-wise, broadband rate quantities are simulated on the assumption that actual subscribership rates are uniformly distributed within their stated range. Because there is no published range for the '5' category — it's just listed as having greater than 800 residential broadband connections per 1,000 households — we've capped our assumed distribution at 100 percent. A published distribution of tracts for the less stringent FCC standard clearly shows a substantial number of tracts with more residential broadband connections than houses, but no such chart is available for the BTOP standard.
The FCC does not appear to provide regular state-by-state subscribership breakdowns using the BTOP/BIP standard (96KB/sec). instead, the biannual reports that accompany the form 477 data release use the laxer FCC standard (25K/second download). However, a report on the June 2009 data does include a state-by-state chart of broadband subscribership using the BTOP/BIP standard; a comparison of the Investigative Reporting Workshop estimates and the FCC's measurements follows below. We followed the convention, used in the American Community Survey, of calculating 90 percent confidence intervals.
June 2009 state-by-state comparison, BTOP/BIP standard (1000 simulations)
|State||IRW estimate, 5th percentile||IRW estimate, median||IRW estimate, 95th percentile||FCC reported value||Difference, FCC estimate minus IRW median|
|District of Columbia||59.66||60.24||60.96||59||-1.24|
There's good agreement between the Workshop's estimates and official FCC figure for states in June 2009 — but it's not clear how good a predictor our simulations are in later years.
Overall, the FCC's subscribership numbers are slightly larger than the Workshop's — which makes sense given that the Workshop's estimate is based on an assumption that tracts have a subscribership rate of 100 percent or less.
It's natural to assume a greater discrepancy between the Workshop's estimates and the "real" quantities would be present when comparing Metropolitan Statistical Areas because population sizes are smaller than states. The smallest of the 100 top MSA's has a population of just over 500,000. It's worth noting, however, that the Workhop's estimates are fairly close to the FCC's numbers for the five states with populations of less than 800,000: Alaska, the District of Columbia, North Dakota, Wyoming and Vermont.
What impact does capping the data at 100 percent of households have on estimates?
To examine these effects at higher reported rates, we also estimated state-by-state subscribership rates using the less stringent HHS standard (25 KB/sec download or upload). Increasing error size with larger reported subscribership rates appears to be slightly more pronounced when examining this standard.
It's hard to know precisely how the bias in the simulated values for the less stringent standard (25KB/sec download or upload) relate to the bias in the more stringent (96KB/sec download and 25KB/sec upload) standard, but it seems apparent that the overall error in the stricter standard is lower simply because the reported values are lower. Moreover, at comparable published NTIA values it appears that BTOP-standard measurement error is slightly lower, which would make sense if there are fewer tracts 'maxed out' at 100 percent.