The Pace of Data: Understanding data source timeliness on the Dashboard
Mar. 15, 2021
Samantha Breslin & Taylor Lampe
Mental health concerns have been front and center recently, with the toll of the COVID-19 pandemic weighing heavily on many. If you are interested in exploring mental health in your community, you can look at frequent mental distress on the Dashboard. You’ll notice, though, that the most recent year of data is 2018, which is now 3 years ago!
Users frequently ask us for more current data. This is especially true now, when fast-paced COVID-19 metrics have raised our expectations for speedy public health data. This blog explains why the Dashboard’s data is often lagged by a few years and how it can be useful for your city – even amidst rapid changes, like what we’re seeing now in the COVID-19 pandemic.
Why are many of our metrics 2-3 years old?
Ideally, we would know the exact number or proportion of people in a particular place--or across the United States--who have completed high school or received a diagnosis for a condition like diabetes. Unfortunately, gathering data from every single person would be extremely expensive and time consuming, and therefore isn’t possible for most public health or social determinant measures. Instead, organizations like the U.S. Census Bureau or the Centers for Disease Control and Prevention (CDC) administer surveys to a smaller number of U.S. residents – a sample that is considered “representative” of, or similar to, the total population. They apply statistics to these survey results in order to approximate the number of people with a certain diagnosis or experience. Using our recent release of 2018 PLACES Project data as an example, we illustrate below how these data get from a survey question to our site and why that process can take some time.
2018: Data are collected in an annual survey called BRFSS (Behavioral Risk Factor Surveillance System). BRFSS is overseen by the CDC, but surveys are conducted by state health departments. Over 500,000 BRFSS phone interviews were conducted during 2018.
Early 2019: After the 50 states sent their interview data to the CDC, data were checked for accuracy, weighted, and analyzed before being released to the public for the nation, all states, and select metropolitan areas.
Late 2019 - 2020: PLACES Project used BRFSS survey data to create metric estimates for geographies smaller than states (eg. cities, census tracts). This process required analyzing individual, county, and state data, running complex statistical models, and validating the estimates.
Early 2021: Once PLACES publicly released their census tract and city estimates, the Dashboard analytic team performed our own internal data preparation for the website, bringing us to our most recent release earlier in March.
As you can see, it takes a lot of time and effort to get these data ready. For most public health practitioners, these data lags are a tolerable trade-off in order to ensure that data are accurate and closely represent the entire population. As the common saying goes, the pros outweigh the cons. Plus, under most circumstances, public health measures do not drastically change over a three-year period. We can expect data from 2017 or 2018 to be similar to current patterns. But what happens during a crisis…or a global pandemic?
How has COVID-19 impacted our data sources?
The COVID-19 pandemic has drastically transformed our lives. Given so much change, we may expect some metrics from 2017 and 2018 to not accurately represent health and its drivers in 2021.
For example, unemployment is at its highest levels since the Great Depression, and many have reported struggling to afford mortgage or rent payments. Relatedly, many people’s health insurance status might be at risk due to job loss. And with over 500,000 lives already taken by COVID-19 across all age groups and demographics, it seems likely that the level of premature deaths is higher than we’d expect.
How can users interpret and use lagged data?
When users visit the site today and see, for example, excessive housing cost estimates from 2018, we suggest they view these data with a critical lens. These data reflect something true about cities and neighborhoods in the recent past, and users can still use this information in combination with their first-hand knowledge about how COVID has impacted different communities. Users should ask questions based on what they know about their city:
“Are there areas that have been affected more by COVID than others?”
“How might COVID deepen existing disparities in my city?” For many measures, like life expectancy, we are already starting to see existing disparities deepen across income and race.
“Where would I expect economic impacts to be more or less severe?”
National datasets are most useful when they support local data and residents’ understandings of a place or community. This was true prior to the pandemic and is still true today.
What is the Dashboard doing to improve data timeliness?
In response to the pandemic, the Dashboard team is thinking critically about our “normal” ways of operating by exploring new metrics and more timely data sources. Here are the first steps we’ve taken to enhance our data offerings:
Adding and updating the COVID Local Risk Index, which estimates city- and neighborhood-level risk of COVID infection and illness severity.
Adding Unemployment, current, city-level, a measure responding to COVID-related changes in unemployment. This measure is updated monthly and has a lag of only 3-4 months instead of the 2-3 year lag in our Unemployment, annual, neighborhood-level metric.
Broadly, though, improving the timeliness of public health data is a goal across the discipline, and advancements in data collection and processing technology are making that goal more of a possibility. For example, the Dashboard is collaborating on a new grant which seeks to find more timely data streams for health metrics, which we can then hopefully bring to the site. If you have any ideas for data sources you think we should explore or if you have any questions about how we present and analyze data on the site, please let us know.