Data visualization guidelines and a case study

Slides from a recent talk on data visualization

Yesterday I gave a virtual lecture on data visualization at GMU. Here I’m posting the slides I used for that talk and including my discussion notes for the portion of the talk where I discussed guidelines for data visualization.

At the beginning of the talk I spoke a bit about data visualization guidelines. I framed this part of my talk around Jon Schwabish’s five guidelines from his new book Better Data Visualizations see (on Amazon) and here for a blog summary.

I then went over some charts I’ve used recently in talks I’ve given and discussed how I used (or didn’t use) the guidelines in that chart.

See a pdf version of the slides below, and a .pptx file (with animations) at this link.

Here are the notes I used for my introductory remarks on the five guidelines.

My interpretation of the five guidelines

I’m going to base my remarks here on the five guidelines that Jon Schwabish lays out in his new book, Better Data Visualizations. I think these five guidelines are excellent both for data visualization beginners and old hands alike. They are:

  1. Show the Data
  2. Reduce the Clutter
  3. Integrate Graphics and Text
  4. Small Multiples
  5. Start with Gray

Show the Data

The first guideline is show your data. Now this might seem like a very simple and obvious guideline, but it’s amazing how often visualizations go off the rails by ignoring this one. The data should be the star of the show. As you add embellishments, reference lines, annotations, colors, additional aesthetics, you can easily get lost and forget about the data itself. Clear communication dictates that you should emphasize the data-driven message above all else. But what is the data you want to show? In many cases you cannot possibly show all the data, so you must focus in on a subset, or do some sort of transformation and aggregation. Implicit in these steps are many assumptions, normalizations, and even moral implications about what is important, what is measured, and what is not.

Which data/aggregation or transformation is most relevant depends critically on your audience, their needs and careful consideration of the factors driving the data. In economics and finance there are often numerous theoretical considerations behind the construction of the data. It’s quite challenging to know precisely what aspect of the data is most important. I typically take an iterative approach, where I rapidly try out different aggregations and transformations. This procedure is a way to separate the signal from the noise. As we go along I’ll give you some concrete examples of the choices I made to show the data.

Reduce the Clutter

A closely related idea is to reduce the clutter. If the data is the star of the show, then all the other embellishments should be reduced. Taken to extreme, this can lead to minimalism. Certainly, a segment of the data visualization community adheres to minimalism. In fact, one of the pioneers of the Data Visualization field, Edward Tufte has been a famous advocate of maximizing what he calls the data ink ratio. On a piece of paper each bit of ink that is not representing data (legends, axis, ticks, gridlines), has the potential to dilute your message. As is often the case with absolutes, taking minimalism to the extreme (or equivalently, to maximize data-ink ratios) can itself obscure the message. Annotations legends and tick marks can be important tools to enhance the signal.

But a good guideline is to start with a minimalist approach. Remove all the extraneous non-essential non-data elements. Then only gradually add them back if they enhance other aspects of communication.

The goal is not necessarily to reduce all non data ink, but to only deliberately include the ink that enhances the message.

Integrate Graphics and Text

To enhance the message a good data visualization will integrate graphics with text. Charts rarely speak for themselves and are almost always improved with thoughtful annotation. We can learn from journalists and see how they accompany headlines with figures. Rather than hope that your audience can go on a learning journey and discover the data insights you’ve found yourselves, it’s often much more efficient to explicitly tell your audience.

One of the most effective ways to integrate graphics and text is to use an active headline. Instead of saying something like “house price growth’ use a title like “house price growth is the fastest since the housing boom”. By including your message in a title you can often greatly enhance the reader’s understanding.

Often in a business setting, the folks reading your charts have little time to figure out all the implications of your analysis. By explicitly stating it in a prominent title or annotation, you can reduce the frustration that busy executives experience when consuming data analysis.

Small Multiples

Another very powerful tool is the use of small multiples. Small multiples are a great alternative to “spaghetti charts”. You all know what I mean by spaghetti charts right? Someone loads up some data in a spreadsheet, say 50 times series and plots them all on a single line chart. The result is a tangled mess of lines that look like spaghetti. With small multiples you repeat the same visual multiple times, say one for each time series. And then arrange the plots in a panel or grid. This is an extremely powerful technique. Even better, you can combine this with a clever arrangement of the grid to further enhance the effectiveness of this technique. We’ll see an example later.

Start with Gray

The final guideline refers to color. Color is a very powerful visual tool. Advertisers know this, toy manufacturers know this. But because of its potency color can often thwart well-intentioned data viz. An excellent guideline is to start your chart out in grayscale and only add color deliberately.

You also should be mindful of accessibility and consider making your data visualizations readable for a color blind audience. You can do this by choosing a color scale that works with different forms of color-blindness. Or you could only use color for emphasize, either dual encoding color to another aesthetic.