08 May 2016

THIS IS PART TWO of my series of meditations on house prices. In our earlier post we covered the data collection, wrangling, and some useful transformations.

## Recap

For convenience, the data files you’ll need to replicate these results are right here:

And the data preparation code is here in the previous post

Now we’ll need to load and prepare some data. We’ll do this using the data.table package to set up our data.

Let’s take a peek at the data:

Above is the first 13 rows of the state data. The key variable is hpi, which tracks the house price index (normalized so that December 2000=100). We computed differences, and I also added pmin and pmax which compute the minimum of hpi after the year 2007 by state and the maximum of hpi before the year 2008 by state respectively. I happened to have computed these values in Excel and merged them back on our data before I saved it as a text file. But we could pretty easily replicate these calculations in R.

For today’s exercise we’ll just need the state, year, month, date, and hpi variables (hpi, hpi12min, hpi12max, and hpa12).

### What are these data?

Remember these data are the Freddie Mac House Price Index for the U.S., 50 states plus D.C. and over 300 metro areas. You can read technical documentation pdf and a FAQ, but the basic explanation-from the FAQ-is:

How can I interpret index values?

Index values represent the value of single-family housing, relative to the last month in the series. To calculate the growth rate between any two months, simply apply the formula: (I2/I1)-1, where I1 is the index value of the first month and I2 is the index value of the second month.

Now that we have the data in hand we can start to make some visualizations.

### On animation

I’ve been using the animation package to make the gifs. In order for this package to work, you’ll need to have one of ImageMagick (http://imagemagick.org) or GraphicsMagick (http://www.graphicsmagick.org) or LyX (http://www.lyx.org) to run the programs.

# Meditation 1: small multiple line plots

Let’s start by constructing a small multiple line plot, where we plot the FMHPI for each state in a separate small plot, sort of like a sparkline. As we have 50 states, we’ll exclude D.C. and the US from this visualization so that we have nice number to work with in the panel plot.

Depending on your monitor size, you might need to zoom out to see the figure all at once. However, you can compress the graphic to a fairly small size, and still get the main points:

By comparing the red dot and dotted line to recent history you can see which states are above their pre-recession peaks and at an all-time high.

Adding some animation might help us absorb the huge amount of information in the charts. What we’re going to do is take the same sparkline, and repeat it several times, building up each state through time. The code below constructs this graph. And you’ll see that the code is not much more than the code above.

This code simply loops through years and creates a version of the plot and then compiles it into an animated gif. I originally posted this gif to Twitter and they have a 5MB limit on gifs so I had to reduce the number of images, skipping 12 months at a time. You can modify the code to loop through each month to create the plot if you have no such space restriction.

# Meditation 2: dot plot

The time series sparklines provide a lot of information. Perhaps too much, particularly for an animated gif.

Let’s also construct a dot plot. The idea is to plot a single month’s value of the index for each state as a dot on the horizontal x axis, and use the vertical axis to display each state. This way, you can compare the relative position of each state. We’ll also add some additional features.

First the code and plot:

Each dot corresponds to the house price index value for an individual state in March of 2016. I’ve also added a “tail” that is a line segment stretching from the rolling 12-month minimum to the rolling 12-month maximum. When the index is up (down) over the past year then the dot will be at the right (left) end of the line. The length of the line helps us compare relative movements in different states. The log scale is important here so that we’re comparing percentage changes in indices, which are meaningful, and not total change in index points, which are not meaningful.

I’ve also added a gradient color scale between red and blue that denotes the percentage change in house prices over the past year. This color is slightly redundant as the length of the segment almost perfectly correlates with this measure. Still, by scanning the chart the bluest dot and tail is the fastest growing, and the reddest will be the slowest (or most negative).