Data Documentation

Below is a high-level description of the currently available datasets in the US Covid Atlas. For further documentation, please see the detailed data descriptions menu below. For data access, use the bulk CSV downloader at the bottom of this page.

CURRENT RELEASE


Confirmed COVID Cases and Deaths

We incorporate multiple datasets from multiple sources to allow for comparisons and opportunites to address uncertainty in the data.

USAFacts. This dataset is provided by a non-profit organization. The data are aggregated from CDC, state- and local-level public health agencies. County-level data is confirmed by referencing state and local agencies directly.

1P3A.This was the initial, crowdsourced data project that served as a volunteer project led by 1P3acres.com and Dr. Yu Gao, Head of Machine Learning Platform at Uber. We access this data stream using a token provided by the group.
New York Times. The New York Times has made data available aggregated from dozens of journalists working to collect and monitor data from new conferences. They communicate with public officials to clarify and categorize cases.

Testing

CDC. The Center for Disease Control provides county-level historic testing data as well as case and death data. Currently, we include tests performed and testing positivity rates as map variables. Total tests conducted and confirmed cases per testing percent, a measure of testing coverage, are available in the Context panel for selected states or counties.

HHS. The Department of Health and Human Services provides state-level historic testing data.

Vaccination

CDC. The Center for Disease Control continues to release new snapshot vaccination data including daily doses distributed and administered. As the available vaccine manufacturers continue to change and the distribution pipeline evolves, we continue to explore how best to capture the state of vaccination efforts. Currently, no robust county-level vaccination datasets are available, but we continue to actively explore seek new data.

Health System Capacity

COVIDCareMap. Healthcare System Capacity includes Staffed beds, Staffed ICU beds, Licensed Beds by County. The data is from 2018 facility reports with additions/edits allowed in real-time.

Community Characteristics, Health Context, and Health Factors

American Community Survey. We incorporate population data used to generate rates and occupation estimates for essential worker percentages. We will add more information as needed in future iterations.

County Health Rankings & Roadmaps (CHR&R). The CHR&R is a collaboration between the Robert Wood Johnson Foundation and the University of Wisconsin Population Health Institute. The goal is to improve health outcomes for all and to close the health gaps between those with the most and least opportunities for good health.
Based on our discussion with CHR&R, we include following focus areas and related measures for inclusion in the Atlas: income and economic hardship, children living in poverty, food insecurity, median household income, income inequality, access to health care, uninsured, preventable hospital stays, ratio of population to primary care physicians, housing cost and quality, Black/White residential segregation, percentage of 65 and older, obesity and diabetes prevalance, adult smoking, excessive drinking, drug overdose deaths, life expectancy and self-rated health condition.

Forecasting statistics

Hospital Severity Index. The Yu Group at UC Berkeley Statistics and EECS has compiled, cleaned and continues to update a large corpus of hospital- and county-level data from a variety of public sources to aid data science efforts to combat COVID-19 (see covidseverity.com).

At the hospital level, their data include the location of the hospital, the number of ICU beds, the total number of employees, and the hospital type. At the county level, their data include COVID-19 cases/deaths from USA Facts and NYT, automatically updated every day, along with demographic information, health resource availability, COVID-19 health risk factors, and social mobility information.

An overview of each data set in this corpus is provided here. We will be adding more relevant data sets as they are found. We prepared this data to support healthcare supply distribution efforts through short-term (days) prediction of COVID-19 deaths (and cases) at the county level. We are using the predictions and hospital data to arrive at a covid Pandemic Severity Index (c-PSI) for each hospital. This project is in partnership with Response4Life.A paper on the current approaches can be found at this link. The more detailed information with data source descriptions is provided on the github.

Mobility Data

Safegraph Social Distancing. Safegraph has provided Census Block Group level data that reports mobile phone device activity reported from apps that collect locations data. This data has been made available for COVID-19 related research projects. The data is generated from a series of location pings throughout the day, which determine various behaviors, such as staying completely home, full time work (at a workplace outside of home for 6-8 hours), part time work (at a workplace outside of home for 3-6 hours), and delivery (multiple, short visits). Access to the data consortium is available here.

Detailed Data Descriptions


Meta Data Name: USAFacts

Last Modified: 2/28/2021

Author: Stephanie Yang

Data Location:

GeoDaCenter/covid/public/csv

Data Source(s) Description:

USAFacts publishes COVID-19 data confirmed cases and death data on county and state level. All data are updated on a daily basis. USAFacts also provide various data visualization here.

Direct links: Confirmed Cases | Deaths

Description of Data Processing:

  • Data are directly downloaded from USAFacts.

Key Variable and Definitions:

covid_confirmed_usafacts_state.csv

VariableVariable ID in .csvDescription
StateStateState abbreviation
State FIPSStateFIPSState level fips code to join to county geospatial data (2-digit)
Confirmed Cases (Time series)ISO Format Date (eg.2020-01-22)Cumulative cases for given geography

covid_deaths_usafacts_state.csv

VariableVariable ID in .csvDescription
StateStateState abbreviation
State FIPSStateFIPSState level fips code to join to county geospatial data (2-digit)
Deaths (Time series)ISO Format Date (eg.2020-01-22)Cumulative deaths for given geography

covid_confirmed_usafacts.csv

VariableVariable ID in .csvDescription
County NameCounty NameCounty Name
County FIPScountyFIPSCounty level fips code to join to county geospatial data (5-digit)
StateStateState abbreviation
State FIPSStateFIPSState level fips code to join to county geospatial data (2-digit)
Confirmed Cases (Time series)ISO Format Date (eg.2020-01-22)Cumulative cases for given geography

covid_deaths_usafacts.csv

VariableVariable ID in .csvDescription
County NameCounty NameCounty Name
County FIPScountyFIPSCounty level fips code to join to county geospatial data (5-digit)
StateStateState abbreviation
State FIPSStateFIPSState level fips code to join to county geospatial data (2-digit)
COVID-19 Deaths (Time series)ISO Format Date (eg.2020-01-22)Cumulative deaths attributed to COVID for given geography (state)

Description of Data Source Tables:

See the Detailed Methodology and Sources: COVID-19 Data for additional information.

Data Limitations:

No limitations to report.

Comments/Notes:

n/a

Meta Data Name: New York Times

Last Modified: 2/23/2021

Author: Dylan Halpern

Data Location:

GeoDaCenter/covid/public/csv

Data Source(s) Description:

The New York Times is publishing their on-going COVID-19 data, available here.

Direct links: States | Counties

Description of Data Processing:

  • State and county confirmed case and death data are taken directly from the NYT data and transposed into time-series.

Key Variable and Definitions:

covid_confirmed_nyt.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fipsCounty level fips code to join to county geospatial data
Confirmed Cases (Time series)ISO Format Date (eg.2020-01-22)Cumulative cases for given geography (county)

covid_confirmed_nyt_state.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fipsCounty level fips code to join to county geospatial data
Confirmed Cases (Time series)ISO Format Date (eg.2020-01-22)Cumulative cases for given geography (state)

covid_deaths_nyt.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fipsCounty level fips code to join to county geospatial data
COVID-19 Deaths (Time series)ISO Format Date (eg.2020-01-22)Cumulative deaths attributed to COVID for given geography (county)

covid_confirmed_nyt_state.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fipsCounty level fips code to join to county geospatial data
COVID-19 Deaths (Time series)ISO Format Date (eg.2020-01-22)Cumulative deaths attributed to COVID for given geography (state)

Description of Data Source Tables:

See the New York Times Repo for additional information.

Data Limitations:

No limitations to report.

Comments/Notes:

n/a

Meta Data Name: 1 Point 3 Acres (1P3A)

Last Modified: 2/28/2021

Author: Stephanie Yang

Data Source(s) Description:

1P3A is one of the earliest organizations that collect and publish COVID-19 data. The data is not publicly available, but researchers can fill a request for access here.

No direct links available.

Description of Data Processing:

  • Data are directly downloaded from 1P3A

Key Variable and Definitions:

covid_confirmed_1p3a_state.csv

VariableVariable ID in .csvDescription
State NameNameState Name
GEOID (same as State FIPS)GEOIDState level fips code to join to county geospatial data (2-digit)
Confirmed Cases (Time series)ISO Format Date (eg.2020-01-22)Single-day increased cases for given geography

covid_deaths_1p3a_state.csv

VariableVariable ID in .csvDescription
State NameNameState Name
GEOID (same as State FIPS)GEOIDState level fips code to join to county geospatial data (2-digit)
Deaths (Time series)ISO Format Date (eg.2020-01-22)Single-day increased deaths for given geography

covid_confirmed_1p3a.csv

VariableVariable ID in .csvDescription
County NameNameCounty Name
County FIPSCOUNTYFPCounty level fips code to join to county geospatial data (3-digit)
State FIPSSTATEFPState level fips code to join to county geospatial data (2-digit)
GEOIDGEOIDCounty level fips code to join to county geospatial data (Combination of County FIPS and State FIPS, 5-digit)
GEOID with Country CodeAFFGEOIDAmerican FactFinder summary level code + geovariant code + '00US' + GEOID more details
Legal/Statistical Area DescriptionLSADCurrent legal/statistical area description code for county more details
Confirmed Cases (Time series)ISO Format Date (eg.2020-01-22)Single-day increased cases for given geography

covid_deaths_1p3a.csv

VariableVariable ID in .csvDescription
County NameNameCounty Name
County FIPSCOUNTYFPCounty level fips code to join to county geospatial data (3-digit)
State FIPSSTATEFPState level fips code to join to county geospatial data (2-digit)
GEOIDGEOIDCounty level fips code to join to county geospatial data (Combination of County FIPS and State FIPS, 5-digit)
GEOID with Country CodeAFFGEOIDAmerican FactFinder summary level code + geovariant code + '00US' + GEOID more details
Legal/Statistical Area DescriptionLSADCurrent legal/statistical area description code for county more details
Deaths (Time series)ISO Format Date (eg.2020-01-22)Single-day increased deaths for given geography

Description of Data Source Tables:

See 1P3A's FAQ Board for additional information.

Data Limitations:

Researchers should request data access directly on 1P3A's page.

Comments/Notes:

n/a

Meta Data Name: Center for Disease Control COVID Data

Last Modified: 2/23/2021

Author: Dylan Halpern

Data Location:

GeoDaCenter/covid/public/csv

Data Source(s) Description:

This data is sourced from the CDC's Covid Data Tracker on the County and Vaccination views. The CDC publishes 7-day rolling average aggregations of testing, case, and death data and daily snapshots of vaccination data.

Both state and county datasets can be joined to Census Cartographic Boundary Files. The Atlas uses a resolution of 20M.

Key Variable and Definitions:

County

covid_testing_cdc.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Tests Conducted (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average of total tests completed

covid_tcap_cdc.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Tests Conducted Per 100k Population (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average of tests completed per 100k population in the county

covid_wk_pos_cdc.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Test Positivity Percentage (0-1 scale) (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average percentage of tests conducted that produced a positive result

covid_ccpt_cdc.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Confirmed Cases Per Testing (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average of percentage of cases divided by total tests. A high discrepancy between CCPT and Positivity indicates many cases are missed in testing.

covid_confirmed_cdc.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Covid Cases (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average of new confirmed cases of Covid-19.

covid_deaths_cdc.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Covid Deaths (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average of new deaths attributed to Covid-19.

Vaccination vaccine_admin1_cdc.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fipsState level fips code to join to county geospatial data
First doses administeredISO Format Date (eg.2020-01-22)Daily snapshot of total first doses administered in this state.

vaccine_admin2_cdc.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fipsState level fips code to join to county geospatial data
Second doses administeredISO Format Date (eg.2020-01-22)Daily snapshot of total second doses (full vaccinations) administered in this state.

vaccine_dist_cdc.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fipsState level fips code to join to county geospatial data
Doses distributed and not administeredISO Format Date (eg.2020-01-22)Daily snapshot of doses distributed, but not administered in this state.

Description of Data Processing:

  • Cases and Deaths: 7-day rolling averages of Cases and Deaths are taken directly from the CDC endpoint and transposed into time-series data.
  • Testing Count: Testing count is taken from the 7-day rolling average of new test results (new_test_results_reported_7_day_rolling_average)
  • Testing Capacity: Testing capacity is taken from the testing county above divided by the county population.
  • Testing Positivity: Testing positivity is taken from the 7-day rolling average of positive of new test results (percent_new_test_results_reported_positive_7_day_rolling_average)
  • Testing Confirmed Cases Per Testing(CCPT): Testing CCPT is taken from the 7-day average of new confirmed cases divided by the 7-day average testing count.
  • Vaccination First and Second Doses Administered: The first and second doses administered are taken directly from the Administered_Dose1 field and transposed into time-series. On the frontend of the Atlas, this is presented as a numerator on top of the State population.
  • Vaccination Dose Distributed but not administered: This metric is taken from the total doses distributed (Doses_Distributed) subtracted by the total doses administered (Doses_Administered). This gives an estimation of the number of doses "on hand" for each state. On the frontend of the Atlas, this is presented as a numerator per 100,000 population.

Description of Data Source Tables:

County Data: CDC County data is available from two API endpoints. One for the latest county snapshot and another for state-specific historical time-series data. The URL format for the second endpoint is https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=integrated_county_timeseries_state_{**STATE-2-LETTER-CODE**}_external.

The below data fields are available in the historical data from each state endpoint.

  • fips_code County FIPS code geographic identifier
  • state Two letter state name
  • state_name Full state name
  • cbsa_code Core-Based Statistical Areas name
  • county County name
  • new_cases_week_over_week_percent_change Week over week percent change in cases
  • new_cases_7_day_rolling_average 7-day rolling average of new cases
  • new_cases_per_100k_7_day_rolling_average 7-day rolling average of new cases per 100,000 population
  • new_deaths_7_day_rolling_average 7-day rolling average of new deaths
  • new_deaths_week_over_week_percent_change Week over week percent change in deaths
  • new_deaths_per_100k_7_day_rolling_average 7-day rolling average of new deaths per 100,000 population
  • daily_cli_7_day_rolling_average Daily, 7-day rolling average of "COVID like illnesses"
  • daily_cli_percentage_7_day_rolling_average Usage Unclear. Percentage of daily, 7-day rolling average of "COVID like illnesses"
  • daily_ili_percentage_7_day_rolling_average Usage Unclear. Percentage of daily, 7-day rolling average of "Influenza like illnesses"
  • new_test_results_reported New test results reported in this data window
  • new_test_results_reported_7_day_rolling_average 7-day rolling average of new tests reported
  • percent_new_test_results_reported_positive_7_day_rolling_average 7-day rolling average of positive test results reported on new tests
  • percent_positive_7_day Overall positivity for tests in the past 7 day
  • total_test_results_reported_week_over_week_count_change Week over week change in test results reported
  • testing_suppressed Usage Unclear. Potentially tests suppressed for anonymity or inconclusive tests.
  • total_hospitals_reporting Number of hospital facilities reporting data in this county
  • admissions_covid_confirmed_last_7_days Total hospital admissions for COVID-19 in the past 7 day
  • admissions_covid_confirmed_7_day_rolling_average 7-day rolling average of confirmed hospital COVID admissions
  • admissions_covid_confirmed_last_7_days_per_100_beds Number of confirmed hospital COVID admissions in the past 7 days per 100 hospital beds
  • admissions_covid_confirmed_week_over_week_percent_change Week over week percent change of COVID admissions to hospitals
  • percent_adult_inpatient_beds_used_confirmed_covid Percentage of adult inpatient hospital beds confirmed occupied by COVID patients
  • percent_adult_inpatient_beds_used_confirmed_covid_week_over_week_absolute_change Change in number of hospital beds used for COVID-19 patients
  • hospitals_included_in_percent_adult_inpatient_beds_used_confirmed_covid Data coverage for number of hospitals reporting data adult inpatient bed usage
  • percent_adult_icu_beds_used_confirmed_covid Percent of adult ICU beds used for COVID-19 patients
  • percent_adult_icu_beds_used_confirmed_covid_week_over_week_absolute_change Week over week change in number of ICU hospital beds used for COVID-19 patients
  • hospitals_included_in_percent_adult_icu_beds_used_confirmed_covid Data coverage for number of hospitals reporting data adult inpatient bed usage
  • cbsa_daily_cli_7_day_rolling_average 7-day rolling average of Covid-like illnesses reported in the Core-Based Statistical Areas
  • cbsa_daily_cli_percentage_7_day_rolling_average Usage Unclear. Percent of 7-day rolling average of Covid-like illnesses reported in the Core-Based Statistical Areas
  • cbsa_daily_ili_7_day_rolling_average 7-day rolling average of Influenza-like illnesses reported in the Core-Based Statistical Areas
  • cbsa_daily_ili_percentage_7_day_rolling_average Usage Unclear. Percent of 7-day rolling average of influenza-like illnesses reported in the Core-Based Statistical Areas
  • date Date reported
  • report_date_window_start ISO date format start of reporting window
  • report_date_window_end ISO date format end of reporting window

Vaccination Data: The most recent CDC Vaccination data reports across 4 dimensions. They report:

  • Total Doses vs People: Total Doses includes doses administered (shots given) or distributed (delivered) in that state, but not necessarily to residents of that state. People includes only people from that state.
  • 1st or 2nd dose: Number of first or second doses administered.
  • Total Population or 18 years or older: Population normalization for the whole population or only individuals 18 years or older.
  • Count of doses vs percent of population: Doses administered as a count (or population normalized count) or as a percentage of the state population.

Field descriptions are inferred from CDC descriptions on the Covid Data Tracker and variable names in the page's source bundle.

For direct access to the data, see the CDC Api Endpoint.

Note: In the future, the CDC may make available time-series vaccination data, which should be used instead of these snapshots. See also Our World in Data's repo here.

Current Field and Descriptions

  • Date Date for data report in ISO Format
  • Location Two letter state name
  • ShortName Three letter state name
  • LongName Full state name
  • Census2019 2019 Census population count
  • Doses_Distributed Total doses distributed to this state
  • Doses_Administered Total doses administered in this state
  • Dist_Per_100K Doses distributed in this state per 100,000 population
  • Admin_Per_100K Doses administered in this state per 100,000 population
  • Administered_Dose1 Total number of first doses administered in this state
  • Administered_Dose1_Per_100K First doses administered in this state per 100,000 population
  • Administered_Dose2Total number of second doses administered in this state
  • Administered_Dose2_Per_100K Second doses administered in this state per 100,000 population
  • Administered_Dose1_Pop_Pct Percent of population in this state who have received the first dose
  • Administered_Dose2_Pop_Pct Percent of population in this state who have received the second dose
  • date_type Type of date for this entry, usually "Report"
  • Recip_Administered Doses administered to people from this state
  • Administered_Dose1_Recip First doses administered to people from this state
  • Administered_Dose2_Recip Second doses administered to people from this state
  • Administered_Dose1_Recip_18Plus First doses administered to people from this state 18 years or older
  • Administered_Dose2_Recip_18Plus Second doses administered to people from this state 18 years or older
  • Administered_Dose1_Recip_18PlusPop_Pct Percent of population of this state who have received a first dose aged 18 years or older
  • Administered_Dose2_Recip_18PlusPop_Pct Percent of population of this state who have received a second dose aged 18 years or older
  • Census2019_18PlusPop Population in this state 18 years or older as of the 2019 Census
  • Distributed_Per_100k_18Plus Doses distributed to this state per 100,000 population 18 years or older
  • Administered_18Plus Doses administered in this state to people 18 years or older
  • Admin_Per_100k_18Plus Doses distributed in this state per 100,000 population 18 years or older

Data Limitations:

The data is pre-aggregated to 7-day rolling averages. Currently, we utilize the state dose totals and not the people totals, as the available data history is longer.

Comments/Notes:

n/a

  • Meta Data Name: Health and Human Services Data

Last Modified: 2/23/2021

Author: Dylan Halpern

Data Location:

GeoDaCenter/covid/public/csv

Data Source(s) Description:

HHS State Level COVID-19 Diagnostic Laboratory Testing (PCR Testing) Time Series is available here. It is updated daily and sourced from CDC COVID-19 Electronic Laboratory Reporting (CELR), Commercial Laboratories, State Public Health Labs, In-House Hospital Labs. A full data dictionary is available here.

Description of Data Processing:

  • Testing: Total testing volume is taken from a 7-day rolling average of total testing figures each day.
  • Testing Capacity: Testing capacity is calculated based on testing divided by population, then multiplied by 100,000.
  • Testing Positivity: Testing positivity is taken from the reported positive tests divided by total tests.
  • Testing Confirmed Cases Per Testing: USA Facts State-level confirmed cases are divided by total testing volume.

Key Variable and Definitions:

covid_testing_cdc_state.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)state_fipsCounty level fips code to join to county geospatial data
Tests Conducted (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average of total tests completed

covid_tcap_cdc_state.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)state_fipsCounty level fips code to join to county geospatial data
Tests Conducted Per 100k Population (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average of tests completed per 100k population in the county

covid_wk_pos_cdc_state.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)state_fipsCounty level fips code to join to county geospatial data
Test Positivity Percentage (0-1 scale) (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average percentage of tests conducted that produced a positive result

covic_ccpt_cdc_state.csv

VariableVariable ID in .csvDescription
FIPS Code (Join Column)state_fipsCounty level fips code to join to county geospatial data
Confirmed Cases Per Testing (Time series)ISO Format Date (eg.2020-01-22)7-day rolling average of percentage of cases divided by total tests. A high discrepancy between CCPT and Positivity indicates many cases are missed in testing.

Description of Data Source Tables:

Descriptions via HHS / HealthData.gov.

  • state (string) - Abbreviation of state associated with the test. Typically patient's state of residence, but provider or lab state used when patient is unavailable.
  • state_name (string) - Name of state associated with the test. Typically patient's state of residence, but provider or lab state used when patient is unavailable.
  • state_fips (string) - Numerical identifier of state associated with the test. Typically patient's state of residence, but provider or lab state used when patient is unavailable.
  • fema_region (string) - Region associated with the test. Typically that of patient's state of residence, but provider or lab state used when patient is unavailable.
  • overall_outcome (string) - Outcome of test -- Positive, Negative or Inconclusive.
  • date (date) - Typically the date the test completed or the date that the result was reported back to the patient. If neither are available, it can be the date the specimen was collected, arrived at the testing facility, or the date the test was ordered.
  • new_results_reported (long) - The number of tests completed with the specified outcome in the specified state on the listed date. (Large spikes may result from states submitting tests for several proceeding days at once with a single date).
  • total_results_reported (long) - The cumulative number of tests completed with the specified outcome in the specified state up through the listed date.

Data Limitations:

No limitations to report.

Comments/Notes:

n/a

Meta Data Name: County Health Rankings & Roadmaps

Last Modified: 2/23/2021

Author: Dylan Halpern

Data Location:

GeoDaCenter/covid/public/csv

Data Source(s) Description:

County Health Rankings and Roadmaps publishes data at the state and county level including "how health is influenced by where we live, learn, work, and play." Data can be accessed via their website.

Description of Data Processing:

n/a

Key Variable and Definitions:

chr_health_context / chr_health_context_state

VariableVariable ID in .csvDescription
FIPS Code (Join Column)FIPSCounty and state level fips code to join to county geospatial data
State NameStateLong name of state
County NameCountyLong name of county (county-level only)
65 years Old PercentOVer65YearsPrcShare of the population for a given area older than 65 years of age
Adult Obesity PercentAdObPrcShare of the adult population (20+) with a body mass index (BMI) greater than or equal to 30 kg/m2
Diabetes PrevelanceAdDibPRcShare of adult population (20+) with diagnosed diabetes
Percent SmokersSmkPrcShare of the population who smoke every day or most days and have smoked at least 100 cigarettes
Excess Drinking PercentageExcDrkPrcPercentage of adults reporting binge or heavy drinking
Drug Overdose Mortality RateDrOverdMrtRtNumber of drug poisoning deaths per 100,000 population

chr_health_factors / chr_health_factors_state

VariableVariable ID in .csvDescription
FIPS Code (Join Column)FIPSCounty and state level fips code to join to county geospatial data
State NameStateLong name of state
County NameCountyLong name of county (county-level only)
Childhood in PovertyPovChldPrcPercentage of people under the age of 18 in poverty
Income InequalityIncRtA ratio of the 80th percentile income to the 20th percentile income
Median Household IncomeMedianHouseholdIncomeThe median income of a county or state for households
Food InsecurityFdInsPrcPercent of people without adequate food access
Unemployment PercentUnEmplyPrcPercent of people currently unemployed
Uninsured PercentUnInPrcPercent of people who do not have health insurance
Primary Care Physican RatioPrmPhysRtA ratio of the total population to primary care physicians
Preventable Hospital StaysPrevHospRtA rate of hospital stays per 100,000 Medicare participants in the state or county
Racial SegregationRsiSgrBlckRtAn index reflecting segregation in the state or county (higher value is more segregated)
Severe Housing ProblemsSvrHsngPrbRtPercent of households experiencing at least one of the following (via County Health Rankings): " overcrowding, high housing costs, lack of kitchen facilities, or lack of plumbing facilities."

chr_life / chr_life_state

VariableVariable ID in .csvDescription
FIPS Code (Join Column)FIPSCounty and state level fips code to join to county geospatial data
State NameStateLong name of state
County NameCountyLong name of county (county-level only)
Life ExpectancyLfExpRtAverage life expectancy of residents in years.
Self-Rated HealthSlfHlthPrcPercent of residents self reporting fair or poor health quality.

Description of Data Source Tables:

Detailed descriptions of the included variables with methodology are available from County Health Rankings & Roadmaps:

Data Limitations:

No limitations to report.

Comments/Notes:

n/a

Meta Data Name: Yu Group at UC Berkeley

Last Modified: 02/28/2021

Author: Laura Chen

Data Location:

Data Source(s) Description:

The Yu group at UC Berkeley Statistics and EECS has compiled, cleaned and documented a large corpus of hospital- and county-level data from a variety of public sources to aid data science efforts to combat COVID-19.

Description of Data Processing:

Data are taken directly from the output of the Yu Group at UC Berkeley's model.

Key Variable and Definitions:

berkeley_predictions.csv

VariableVariable ID in .csvDescription
FIPS (Join ColumnfipsCounty geophraphic identifier to join to geospatial data
COVID Hospital Severity Indexseverity_indexA scale of 1-3 indiciating the severity of COVID currently in each county.
Projected Deaths (5 day forecast)deaths_YYYY_MM_DD (time-series)The number of deaths estimated to occur in the next five days.

Description of Data Source Tables:

At the hospital level, the data includes the location of the hospital, the number of ICU beds, the total number of employees, and the hospital type. At the county level, the data includes COVID-19 cases/deaths from USA Facts and NYT, automatically updated every day, along with demographic information, health resource availability, COVID-19 health risk factors, and social mobility information.

See the Yu-Group/covid19-severity-prediction for more information.

  • Hospital Level Data
  • Nursing Homes Level Data
    • nyt_nursinghomes: number of COVID-19-related cases and deaths from nursing homes, as reported by NYT
    • hifld_nursinghomes: database of nursing homes/assisted living facilities, populated via open source authoritative sources
  • County Level Data
    • COVID-19 Cases/Deaths Data
      • nytimes_infections: COVID-19-related death/case counts per day per county from NYT
      • usafacts_infections: COVID-19-related death/case counts per day per county from USA Facts
      • ccd_daily: COVID-19-related deaths, cases, hospitalizations, and testing statistics
    • Demographics and Health Resource Availability
      • ahrf_health: contains county-level information on health facilities, health professions, measures of resource scarcity, health status, economic activity, health training programs, and socioeconomic and environmental characteristics from Area Health Resources Files
      • cdc_svi: Social Vulnerability Index for counties from CDC
      • hpsa_shortage: information on areas with shortages of primary care, as designated by the Health Resources & Services Administration (HRSA)
      • khn_icu: information on number of ICU beds and hospitals per county from Kaiser Health News
      • usda_poverty: county-level poverty estimates from the United States Department of Agriculture, Economic Research Service
    • Health Risk Factors
      • chrr_health: contains estimates of various health outcomes and health behaviors (e.g., percentage of adult smokers) for each county from County Health Rankings & Roadmaps
      • dhdsp_heart: cardiovascular disease mortality rates from CDC DHDSP
      • dhdsp_stroke: stroke mortality rates from CDC DHDSP
      • ihme_respiratory: chronic respiratory disease mortality rates from IHME
      • medicare_chronic: Medicare claims data for 21 chronic conditions
      • nchs_mortality: overall mortality rates for each county from National Center for Health Statistics
      • usdss_diabetes: diagnosed diabetes in each county from CDC USDSS
      • kinsa_ili: measures of anomalous influenza-like illness incidence (ILI) outbreaks in real-time using Kinsa’s county-level illness signals, developed from real-time geospatial thermometer data (private data)
      • cmu_covidcast: epidemiological data from the CMU Delphi COVIDcast, which includes data on COVID-like symptoms from Facebook surveys, estimated COVID-related doctor visits and hospital admissions, and other indicators
    • Social Distancing and Mobility/Miscellaneous
      • nytimes_masks: mask-wearing survey data from NYT and Dynata
      • google_mobility: community mobility reports from Google
      • apple_mobility: mobility trends from Apple maps direction requests
      • unacast_mobility: county-level estimates of the change in mobility from pre-COVID-19 baseline from Unacast (private data)
      • streetlight_vmt: estimates of total vehicle miles travelled (VMT) by residents of each county, each day; provided by Streetlight Data (private data)
      • safegraph_socialdistancing: aggregated daily views of USA foot-traffic summarizing movement between counties from SafeGraph (private data)
      • safegraph_weeklypatterns: place foot-traffic and demographic aggregations that answer: how often people visit, where they came from, where else they go, and more; from SafeGraph (private data)
      • jhu_interventions: contains the dates that counties (or states governing them) took measures to mitigate the spread by restricting gatherings (e.g., travel bans, stay at home orders)
      • mit_voting: county-level returns for presidential elections from 2000 to 2016 according to official state election data records
  • Miscellaneous Data
    • bts_airtravel: survey data including origin, destination, and itinerary details from a 10% sample of airline tickets from the Bureau of Transportation Statistics
    • fb_socialconnectedness: an anonymized snapshot of all active Facebook users and their friendship networks as a measure of social connectedness between two different places

Data Limitations:

No limitations to report.

Comments/Notes:

n/a

Meta Data Name: Safegraph Social Distancing Data

Last Modified: 3/03/2021

Author: Andres Crucetta Nieto

Data Location:

Safegraph Social Distancing data can be accessed via their COVID-19 data consortium signup.

Data Source(s) Description:

Source is here via Safegraph.

Description of Data Source Tables:

The data was generated using a panel of GPS pings from anonymous mobile devices. We determine the common nighttime location of each mobile device over a 6 week period to a Geohash-7 granularity (~153m x ~153m). For ease of reference, we call this common nighttime location, the device's "home". We then aggregate the devices by home census block group and provide the metrics set out below for each census block group. [1]

Description of Data Processing:

The data for Social Distancing was distilled from its raw status into daily and weekday format. We also created a percent change from 2019 dataset to compare trends in the workplace.

Key Variable and Definitions:

Percentage of Delivery Workers (Daily)

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Week DayYYYY-MM-DDPercentage of delivery workers for that day

Percentage of Full-Time Workers (Daily)

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
DayYYYY-MM-DDPercentage of full-time workers for that day

Percentage of Full-Time Workers (Weekday)

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Week DayYYYY-MM-DDPercentage of full-time workers for that weekday

Percentage of Home Dwellers (Daily)

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
DayYYYY-MM-DDPercentage of home dwellers for that day

Percentage of Part-Time Workers (Daily)

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
DayYYYY-MM-DDPercentage of part-time workers for that day

Percentage of Part-Time Workers (Weekday)

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Week DayYYYY-MM-DDPercentage of part-time workers for that weekday

Change from 2019 - Full-Time

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
WeekYYYY-MM-DDPercentage change of full-time workers from 2019 for that week

Change from 2019 - Home Dwellers

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
WeekYYYY-MM-DDPercentage change of home dwellers from 2019 for that week

Change from 2019 - Part-Time

VariableVariable ID in .csvDescription
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
WeekYYYY-MM-DDPercentage change of part-time workers from 2019 for that week

Percent of People at Home

VariableVariable ID in .csvDescription
WeekYYYY-MM-DDPercentage of part-time workers for that week
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Percent of people homepct_homePercentage of people home for that week

Percent of People Full-Time

VariableVariable ID in .csvDescription
WeekYYYY-MM-DDPercentage of full-time workers for that week
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Percent of people full-timepct_fulltimePercentage of people working full-time for that week

Percent of People Part-Time

VariableVariable ID in .csvDescription
WeekYYYY-MM-DDPercentage of full-time workers for that week
FIPS Code (Join Column)fips_codeCounty level fips code to join to county geospatial data
Percent of people part-timepct_parttimePercentage of people working part-time for that week

Data Limitations:

To preserve privacy, we apply differential privacy to all of the device count metrics other than the device_count. This may cause the exact sum of devices to not equal device_count, especially for sparsely populated origin_census_block_group. Differential privacy is applied to all of the following columns: completely_home_device_count, part_time_work_behavior_devices, full_time_work_behavior_devices, delivery_behavior_devices, at_home_by_each_hour, bucketed_away_from_home_time, bucketed_distance_traveled, bucketed_home_dwell_time, bucketed_percentage_time_home. [1]

If as a result of the differential privacy applied:

device_count < part_time_work_behavior_devices + full_time_work_behavior_devices +completely_home_device_count or
device_count < sum(counts in bucketed_distance_traveled) or
device_count < sum(counts in bucketed_home_dwell_count),

we then increase the device_count to the applicable sum (this only occurs in census_block_groups with small device_counts).

Comments/Notes:

References:

[1] https://docs.safegraph.com/docs/social-distancing-metrics

Meta Data Name: American Community Survey

Last Modified: 2/28/21

Author: Kenna Camper

Data Location:

GeoDaCenter/covid/data

Data Source(s) Description:

Source is here via the United States Census Bureau.

Description of Data Processing:

Population counts are taken directly from 2019 ACS 5-year estimates and joined to geospatial data. Essential worker estimates are generated from 2019 ACS 5-year estimates of workers by "essential" occupations over total workers in each county. We currently use the occupation categories from the Chicago Metropolitan Agency for Planning.

Key Variable and Definitions:

context_essential_workers_acs

VariableVariable ID in .csvDescription
FIPS (Join ColumnfipsCounty geophraphic identifier to join to geospatial data
Percent of essential workerspct_essentialShare of workers in essential occupations on a scale of 0-1.

county_pop.csv

VariableVariable ID in .csvDescription
GEOIDGEOIDCounty Geographical ID number
County and stateNAMECounty and state name
Total populationtotal_populationTotal population of a county
MalesmaleNumber of males in a county
FemalesfemaleNumber of females in a county
Males above 50male_50aboveNumber of males above age 50 in a county
Females above 50female_50aboveNumber of females above age 50 in a county

Description of Data Source Tables:

See the American Community Survey Data for additional information.

Data Limitations:

No limitations to report.

Comments/Notes:

n/a

Meta Data Name: Hospital and Clinic Locations

Last Modified: 3/3/2021

Author: Dylan Halpern

Data Location:

GeoDaCenter/covid/public/csv

Data Source(s) Description:

Federally qualitifed health clinic locations and testing status are sourced from HRSA's online location finder. Hospital location and information are sourced from the CovidCareMap project.

Description of Data Source Tables:

n/a

Description of Data Processing:

Data from HRSA are taken directly and filtered for the columns listed below. CovidCareMap hospital data is included completely.

Key Variable and Definitions:

context_fqhc_clinics_hrsa.csv

VariableVariable ID in .csvDescription
Clinic NamenameName of the clinic in HRSA's database
State Abbreviationst_abbr2-letter state name
City NamecityCity where the clinic is located
Street AddressaddressClinic street address
Phone NumberphoneContact phone number for clinic
COVID Testing Availabilitytesting_statusLast queried testing availability status
LongitudelonClinic longitude value in WGS84
LatitudelatClinic latitude value in WGS84

context_hospitals_covidcaremap.cdc

VariableVariable ID in .csvDescription
Hospital NameNameHospital Name
Hospital TypeHospital TypeHospital category (eg. Long Term Care, Short Term, Acute)
Street AddressAddressLocal street address
Street Address (continued)Address_2Local street address (suite, number, etc.)
CityCityHopsital City
StateStateHospital State
ZIP CodeZipcodeHospital ZIP code
CountyCountyHospital County
LatitudeLatitudeHospital latitude value in WGS84
LongitudeLongitudeHospital longitude value in WGS84

Additional fields describe hospital bed capacity and occupancy.

Data Limitations:

n/a

Comments/Notes:

n/a

Meta Data Name: Geographies

Last Modified: 3/3/2021

Author: Dylan Halpern

Data Location:

All geospatial data used in the Atlas are available under GeoDaCenter/covid/public/geojson.

Data Source(s) Description:

County and state boundaries are sourced from the US Census Cartographic Boundary Files at the 20m resolution.

Native American or American Indian reservation boundaries come from the TIGER/line 2017 dataset.

Congressional district boundaires come from the 2018 National Congressional District Boundaries.

Hypersegregated cities are based on work by Massey and Tannen, 2015 (press release, article)

Description of Data Source Tables:

Source geospatial data provide boundaries and a geospatial identifier (GEOID or FIPS code).

Description of Data Processing:

State and county boundaries are joined with basic information for normalization (population and beds). Highlight layers are generated using a symmetrical difference operation against a dissolved US geography.

Key Variable and Definitions:

VariableVariable ID in .csvDescription
Geographic ID (Join Column)GEOIDCounty and state level GEOID code to join to tabular data
Population Countpopulation2019 ACS 5-year estiamte of population in each county or state
Licensed BedsbedsNumber of licensed hospitals beds in each county or state
Testing Criteria (Depricated)criteriaNo longer used: State or county COVID testing criteria

Data Limitations:

Comments/Notes:

n/a