I TOOK SOME TIME OFF OVER THE SUMMER, away from data visualizations. It’s good to get away from time to time, but oh boy did I miss out.
I wasn’t gone long, but in the short time I was gone people came up with some wonderful things.
Let me dive back into it with some joyful dataviz.
Claus Wilke (Twitter) authored a new R package for creating joy plots ( LINK for ggjoy vignette). See also this post from Revolution Analytics with some other joyplot examples and some more background. Let’s try them out.
As David Smith over at Revolution Analytics points out, the plots can obscure some data. Nevertheless I find them evocative enough that some data obfuscation might be worth the interest they create. I dunno, let’s try it out.
We’ll pick up with a data frame called
df.metro that looks like so:
The variable hpi is the house price index (normalized so that January 2000 = 100). The variables
hpa12 are the one-month and 12-month percent changes in the house price index. The other variables tell us the date, the metro name (
geo), the primary state for the metro area, the year and the month.
Let’s construct a joyplot showing how the 12-month appreciation in house prices varies across metro areas by year.
This plot shows the time series history of metro house price appreciation. We can see the wide dispersion during the housing bust, when some metros saw house prices decline by more than 20 percent annually.
Let’s look compare the distributions across two large states, metros in California and metros in Texas.
We can see that while Texas house prices held up pretty well during the Great Recession, many California markets saw big declines. Since then, California has rebounded and in recent years California metros have had faster house price growth than Texas metros.
How about that crazy plot?
In my first plot, I intentionally left off the labels. But it’s just the CA vs TX plot above with all 50 states + D.C. included. Let’s recreate it with a few labels.
I think joyplots work well if there are some important differences across groups. For example, I think they work to highlight seasonal patterns. The graph below shows monthly existing home sales, not seasonally adjusted.
Is it useful?
Joyplots certainly are useful insofar as they make an impression. Other chart types are probably better for many applications. For example, if you really want to compare distributions good old boxplots are hard to beat unless you have a very odd distribution.
But without a doubt, making joyplots is a joyful exercise. And when is joy not useful?
Don’t discount the importance of being able to resonate with your intended audience. It might well be worth it to sacrifice some clarity if it buys us joy.