In March—which feels like forever ago—the spread of COVID-19 in the U.S. was undeniable and many were left wondering what to do. At that point, there was a lot of information circulating about COVID-19, but not a ton of centralized resources to understand the broad patterns. I wasn’t sure where to start, but I decided I wanted to help however I could. This blog post is a reflection on the past six months and highlights some of the data visualizations I used to help amidst the pandemic.

Quickly jump to a section:

Visualizing Growth instead of Cases

Even though I didn’t quite know where to start, I was able to get an early glimpse of what data scientists were thinking about during the pandemic with the American Family Insurance Data Science Institue’s COVID-19 research group (side note: this group has an excellent collection of tools and visualizations for understanding the pandemic here). Professor Brian Yandell started this group in March; I quickly found out about it since I was taking Statistical Consulting with him. At that point, many sites were beginning to show COVID-19 trends across the US, including John Hopkins, 1point3acres, Coronadatascraper, r/dataisbeautiful, and others. The problem was that many of these sources focused on State- and Country-level data.

It became clear that there we could add value by looking at smaller regions within Wisconsin and displaying that data to show important trends.

What lit the spark for this idea was an in-depth data analysis done by the New York Times in “How Severe Are Coronavirus Outbreaks Across the U.S.? Look Up Any Metro Area”. There, they looked at the growth rate in metropolitan areas such as Chicago, New York City, New Orleans. Both aspects here were novel, but looking at growth rate seemed particularly important for areas that were newly affected by COVID-19. Looking at the visualization below (from the CDC), which was common at this time, easily shows the areas that have seen large amounts of cases, but it becomes very difficult to see new hotspots because they get lost in clusters in the bottom left corner of the plot. By looking instead at growth, the y-axis changes, and new hotspots jump out.

Srikanth Aravamuthan, Steve Goldstein, and I set out to apply the same idea to Wisconsin metropolitan areas, specifically. We built out a document with interactive visualizations (built in Plotly) of COVID-19 growth that updated daily and is hosted on an R Server backend:

https://data-viz.it.wisc.edu/wi-metro-growth-rate/

For our purposes, we measured growth as the average daily change in the last week:

\[ \text { avg daily change}_{t}=\left(\frac{\operatorname{cases}_{t}}{\operatorname{cases}_{t-7}}\right)^{(1 / 7)}-1 \]

With this simple metric, we looked at the average daily change against two measures of progression: time and total cases. A sample of the growth rate plots is shown below from back in early May.

Growth rate of confirmed cases vs time

Here we see that the Green Bay area was seeing very high and sustained growth and that confirmed cases in Janesville-Beloit were just starting to grow in the last week or so. A place like Madison, which had seen numbers rise in March, was then showing very little growth. It has some clear strengths and weaknesses.

Pros of this measure: Growth rates help us judge whether the epidemic is getting better or worse in a given place right now.

Cons: The timing of different outbreaks can make comparisons difficult. Case data quality varies a lot by place.

Growth rate of confirmed cases vs cases

Now, looking at growth versus the number of cases, a few new patterns emerge. For Green Bay, the high and sustained growth resulted in that area having the highest confirmed cases per capita of any Wisconsin metropolitan area. Compare this to Milwaukee, which saw a lot of growth in March, and low-to-moderate growth in April, and had about 2.5 confirmed cases per thousand. I view this plot as a “race to the bottom” of sorts because when growth is 0, the line stops moving the right. Areas that are traveling quickly to the right may indicate a lack of “flattening the curve.” Again, there are pros and cons to this plot as well.

Pros of this measure: Helps distinguish between places where cases are growing fast with few cases and places where cases are numerous and still growing fast.

Cons: Hard to read at first. Relies on case data

Working with health systems

These plots, and maps of the metro areas, gave an important look at how COVID-19 was spreading in Wisconsin, and they were often looked at by health system officials, members of the research group, and others. Two health systems, Gundersen and Marshfield Clinic, expressed interest in customizing these visualizations to work with their service areas. We included county-level information since many rural counties they served didn’t fall into the metropolitan areas and also looked at new cases per week as a measure of growth.

Looking at some of those plots today helps see major growth recently in the northeastern portion of Wisconsin in counties like Shawano (purple), Forest (pink), and Waupaca (tan).

The feedback that our contacts at Gundersen and Marshfield provided was very positive, and they indicated we were able to help them spot problem areas very early on in the pandemic.

This work ended up winning one of nine Accelerator Grants from the Wisconsin Alumni Research Foundation to help COVID-19 research. It was also a great way to learn more about creating interactive visualizations in Plotly and publishing them on the web so that anyone could use the visualization. Working directly with stakeholders who were interested in adapting the visualizations was also a valuable learning process.

Geo-faceting for Spatial Understanding

After the data visualization work with growth rates that Srikanth and I did, it became clear to us that understanding spatial patterns in the data was also important. Looking at maps of confirmed cases was useful, but after a few months, it didn’t tell the whole story. Places like Green Bay started to see dramatic drops in growth in June and July, along with the surrounding areas, but their overall case count was still high. Additionally, we started to realize that new cases and average daily changes weren’t always the best metrics for understanding growth.

A really useful measure of growth is captured in the instantaneous \(R_0\) or reproduction number. \(R_0\) measures, on average, how many individuals someone with COVID-19 infects, and this changes over time as adherence to social distancing, mask usage, and other factors change. We wanted to see how \(R_0\) varied over time—and more importantly—across Wisconsin.

To that end, we sought to replicate the idea of geo-faceting, inspired by the R package geofacet link that does it in ggplot2. Using some code from collaborators Dorte Doepfer and Francisco Mandujano to estimate \(R_0\) with confidence intervals from an SEIR model, we got to work.

Our first attempt at this is implemented at https://data-viz.it.wisc.edu/instantaneous-r0-geofacet-wi-county/. The document goes into more detail, but the process is simple: make a grid of plots that closely represent the spatial structure of Wisconsin counties, and show a time series of \(R_0\) in each plot of the grid. From there, we added details such as background colors to represent whether the current \(R_0\) is low, medium, or high, and an interactive hover feature that shows more detail in an individual plot.

geo-facet of reproduction number 1

We thought this visualization was particularly unique, and so we took the time to re-implement it in Python with some additional details. The plot below shows the modifications, which include a more detailed color scheme that considers the uncertainty of the estimate. The key idea is that if the confidence interval spans multiple categories (e.g. low and medium), we change the color to be something between low and medium. Intervals that span the entire range are greyed out, and this happened often with small counties that only had a few cases.

geo-facet of reproduction number 2

The whole plot gives a high-level view of how COVID-19 is reproducing in Wisconsin. For example, in late May, the Green Bay area greatly improved, whereas Madison and Milwaukee were seeing higher growth. We can also see that there is still not enough information in many rural counties in northern Wisconsin to determine \(R_0\).

Looking in more detail at a few plots, we can start to see the time trends of COVID-19 reproduction and how they can vary between nearby counties. Hovering provides additional details like the number of cases and actual \(R_0\) estimates.

The Python implementation was submitted to the John Hunter Excellence in Plotting Contest which is featured at SciPy 2020. We were humbled to win an honorable mention for this visualization, particularly among a large number of entries visualizing COVID-19 data.

Screening Strategies for College Re-opening

As summer started to come to an end, the big question on a lot of people’s minds was whether schools could safely resume activities in the fall. Essentially all colleges switched to remote learning in Spring 2020, but it was unclear whether colleges could re-open for in-person classes in the fall. One important paper that sought to answer this question was “Assessment of SARS-CoV-2 Screening Strategies to Permit the Safe Reopening of College Campuses in the United States” by A. David Paltiel, Amy Zheng, and Rochelle P. Walensky in JAMA Network. The paper and authors received a lot of press and suggested that frequent testing of students could allow for a safe re-opening under some moderate assumptions.

Brian Yandell and Steve Goldstein suggested that it could be useful to have a dashboard to test out the model in the paper and see how that might apply to UW - Madison. Sri and I got to work and implemented a dashboard in R using shiny and shinydashboard within a few days: https://data-viz.it.wisc.edu/covid-19-screening/

paltiel dashboard

This dashboard was modeled off a google spreadsheet that the original paper authors put together, and our implementation was so well received by the paper authors that their spreadsheet links directly to our dashboard now. It has a range of input parameters that can be changed to see how that affects the amount of testing, number of infections, and isolation capacity. These metrics are all important for understanding whether a college can safely re-open, since changing a few inputs can have a dramatic change in results

paltiel dashboard updated

Overall, this dashboard was useful for the COVID-19 modeling team at UW - Madison and beyond to understand how changes to inputs like \(R_0\) and frequency of testing might affect the spread on campus. It has received over a thousand views by users across the country and the source code is readily available so that other researchers can modify the paper assumptions. As I reflect on the last six months, it seems that this may have been the most directly useful visualization that we built.

Wrap Up

I can’t say that working on visualizations to understand the spread of COVID-19 is how I imagined part of my spring and summer going, but looking back I believe that the work we did made a material impact. Our early growth rate visualizations helped contextualize how the spread was different in metropolitan areas in Wisconsin, and the collaboration with Gundersen Health and Marshfield Clinic extended that understanding to rural counties. Looking at geo-faceted \(R_0\) plots furthered our understanding of spatial trends in COVID-19 growth and won honors in JHEPC at SciPy 2020. Building an interactive dashboard for college COVID-19 screening was an amazing collaborative effort to extend the work of a popular paper and make it useable to researchers across the country.

Along the way, I was able to improve my visualization skills in finding effective encodings of publicly available data. We implemented them in R and Python through ggplot2, Plotly, and other open-source packages. Most importantly, we learned how to host these visualizations on publicly available sites so that the public could interact and use them. It was a summer of growth (for me, not just for COVID-19) and a summer well spent in my eyes.


Find me
Website
Contact Info

Copyright 2017 Sean Kent All Rights Reserved | Design By W3layouts