23 May 2017

SO WE HAVE BEEN PLOTTING A LOT OF INDEX VALUES LATELY. It’s been great. But you have questions. Great questions.

I got an interesting response to my house price dot chart over Twitter regarding the house price index we were plotting. User @chrisschnabel wondered how the choice of starting point influenced how the house price dot chart looked.

The choice of index starting point does indeed influence how the index looks. Consider this visualization:

This visualization shows how the choice of starting point influences how the house price index (plotted on a log scale naturally) looks. Each line is an individual state’s house price index, normalized so that a particular date is equal to 100. The plots look quite different depending on the choice of normalizing date.

Let’s build up to this plot. Per usual I will include R code to construct the visualizations.

## Get data

Like in some previous posts (check out here to see ribbon charts, here for dots charts and here for an interactive flexdashboard) we will use the Freddie Mac House Price Index.

While we shared the data wrangling bits before, it’s short enough that I can include them here.

You can download the Excel spreadsheet with state house price index values here. Note that this code is based on the release with data through March, 2017, future releases may shift the exact location of the cells. Using the range argument of readxl we can reach into the spreadsheet and get our data ready.

Just save the excel file in your own data directory.

Then:

After completing this code you’ll have a data file ready for use. Let’s take a quick peek.

 date geo hpi type Our data frame df.state 1 2017-03-01 TX 185 state 2 2017-03-01 UT 176 state 3 2017-03-01 VA 181 state 4 2017-03-01 VT 166 state 5 2017-03-01 WA 204 state 6 2017-03-01 WI 135 state 7 2017-03-01 WV 144 state 8 2017-03-01 WY 185 state 9 2017-03-01 United States not seasonally adjusted 171 state 10 2017-03-01 United States seasonally adjusted 171 state Source: Freddie Mac House Price Index

For the moment all we need are the various state indices. The data come normalized so that December of 2000 is equal to 100. There’s nothing particularly special about that date. The index is most useful for calculating the percentage change in average house values between two points in time (see for example this FAQ). Because growth rates across states differ over time, the choice of the points you compare will influence how a the plot of an index will look.

To see how, let’s build up to our gif.

Let’s start by plotting just the index for one state, my current home of Virginia.

We’ve marked the last value (for March 2017) with a red dot and label. The Virginia index is at 181, which means that relative to December 2000, house prices in Virginia in March 2017 are up 81%.

If we were interested in how much prices have risen since March 2016, we could renormalize the index so that March 2016 is equal to 100 and replot the index. I’ve got a dplyr trick for this.

Using group_by and mutate plus brackets and a filtering statement such as mutate(hpi.0316=100*hpi/hpi[date=="2016-03-01"]) below enables us to index the time series data. Because I like to normalize data often, this little pattern is of great use to me.

The general shape of the index looks the same (particularly on a log scale), but the index value is now 102.9, indicating that house prices in Virginia have risen 2.9 percent from March 2016 to March 2017.

# Comparing many states

Let’s make some spaghetti. We’ll plot each of the 50 states plus the District of Columbia on a single plot. We’ll restrict our attention to just values from the year 2000 forward.

Now we can see quite a lot of variation across states.

Let’s renormalize so that our last data point (March 2017) is equal to 100 and plot it:

Same data, but doesn’t quite look the same.

Before we get to the animation, how about one more static plot?

Let’s make a small multiple seeing how the plots differ as we let the reference date vary from March of 2000 to March of 2017, one year at a time. We’ll also use purrr’s map_df to help us. For more see this post on nested recursions.

Here we can see that although renormalizing the index affects the level of the index, it doesn’t really change the shape. Especially if we plot the index on a log scale.

## Make an animation

Now, we can use animation and the functions we’ve created to generate the animation we opened with.

Running this gives our original animated plot:

It kind of looks like you’ve got a wire-tie and a bunch of cords and you’re sliding the knot.

## Conclusion

Whenever you look at index values, ratios, or any type of number that doesn’t have any units you should use a skeptical eye. Usually folks do not intend any evil, but careless use of such metrics can lead to spurious conclusions.

I like to look at data in a variety of ways. As I said earlier today, if a different visualizations of the same data tell different stories, then you might have found a compelling and completely false narrative. It happens, but active vigorous visualization can help decrease its likelihood.