This blog post is all about personal growth. Last year, I was playing around with some data I scraped from BeerAdvocate’s website. In particular, I was looking at their Top 250 Rated Beers list, which contains information on the beers that BeerAdvocate users rate highest. From that data, I created the visualization in the following tweet:


To make it easier to read, I’ve recreated the same chart using more recent data, pulled on September 29, 2019.

Now, as I’m currently enrolled in a class focused on Data Visualization (CS 765), I went back through some of my old visualizations and quickly realized several problems with this one:

It doesn’t take being in a data vis class to see these issues. Nevertheless, the class has helped to create a framework for critiquing the visualization and redesigning. The number one takeaway for me so far is that most good visualizations start with a clear task or purpose. With that concept in mind, let’s talk about…

Task-based Visualization

The purpose of the visualization was to show that (at that time) American IPAs and American Imperial Stouts were far more prevalent in the top 250 list than any other style of beer. A similar trend exists now, but with American Imperial Stouts and New England (sometimes called Hazy) IPAs.

With this purpose defined, including the information about the score is less important. The score was the average (among beers in the top 250) of the average (among reviews for that beer) score from 0-5 for each style. Since we are only looking at the top 250 beer by rating, we expect these scores to be very high anyway. Sometimes a style with a high average is due to it only having a few beers in the top 250, whereas the common beers might be more spread out over the list. Realizing this, it made much more sense to use color to encode more useful information about the style.

In the following redesign, I group the beers styles by family (using BeerAdvocate’s definitions and match each family as closely as possible to the typical beer color in that family (using this SRM chart and an SRM to hex code mapping.



This plot does a lot of things right. It helps show that while a few styles still have a large number of beers in the top 250 list, the color of these beers is quite diverse and many families of beer are competing for these top spots (Stouts, IPAs, Wild/Sour Beers). The color confusion is nearly eliminated, and overall it is much more effective at the purpose.

However, the last two critiques from before still aren’t fully addressed. Can we address them without losing the message?

Enter Treemap

Treemaps tend to be polarizing graphics. There are great examples where treemaps work (See In Praise of Treemaps for an example with data from the 2012 US Presidential Election and examples where they don’t work (See 10 Lessons in Treemap Design for a lot of bad examples).

As discussed in An Alternative To Treemaps, there are appropriate uses for Treemaps, especially when

  1. You want to visualize a part-to-whole relationship amongst a large number of categories.
  2. Precise comparisons between categories are not important.
  3. The data is hierarchical.

But this is almost exactly what the beer data is. We care about what fraction each style is of the 250 total beers, and there are many styles in this list. Precise comparisons aren’t important, and there is a hierarchy in that each style belongs to a family. We might also care whether that family is popular or not, which is very hard to determine, even from the redesign.

Fortunately, I found a great package treemapify designed by David Wilkins (and available on CRAN). It allows one to make treemaps within a ggplot2 framework. After a bit of work on the details, I landed on the following adapted redesign, which I am quite happy with.



The original purpose is well demonstrated by the treemap, and it seems to address all of the critiques of the original design. We lost the ability to know exactly how many beers are in each style, but visualizing the part-whole relationship is much clearer. I’m proud of the “bar chart legend” on the side which shows which colors map to each beer family, while also giving some detail on the count each family has in the top 250 list.

Overall, I think this visualization is miles ahead of my original design. I was surprised at how effective thinking about a specific task and critiquing a visualization could be. I’m always encouraged to see my growth and I hope this post inspires you to take a look at your visualizations with some of the techniques I describe.

If you’re curious how I made the visualizations, all of the R code and data is available on Github: https://github.com/skent259/beer-data. Feel free to check out my twitter @Sean__Kent, where I infrequently post about data visualization and cool things from the statistics community and the R community. More blog posts to come soon!



Find me
Website
Contact Info

Copyright 2017 Sean Kent All Rights Reserved | Design By W3layouts