Now we’ll need to load and prepare some data. We’ll do this using the data.table package to set up our data.
Let’s take a peek at the data:
Above is the first 13 rows of the state data. The key variable is hpi, which tracks the house price index (normalized so that December 2000=100). We computed differences, and I also added pmin and pmax which compute the minimum of hpi after the year 2007 by state and the maximum of hpi before the year 2008 by state respectively. I happened to have computed these values in Excel and merged them back on our data before I saved it as a text file. But we could pretty easily replicate these calculations in R.
For today’s exercise we’ll just need the state, year, month, date, and hpi variables (hpi, hpi12min, hpi12max, and hpa12).
What are these data?
Remember these data are the Freddie Mac House Price Index for the U.S., 50 states plus D.C. and over 300 metro areas. You can read technical documentation pdf and a FAQ, but the basic explanation-from the FAQ-is:
How can I interpret index values?
Index values represent the value of single-family housing, relative to the last month in the series. To calculate the growth rate between any two months, simply apply the formula: (I2/I1)-1, where I1 is the index value of the first month and I2 is the index value of the second month.
Now that we have the data in hand we can start to make some visualizations.
I’ve been using the animation package to make the gifs. In order for this package to work, you’ll need to have one of ImageMagick (http://imagemagick.org) or GraphicsMagick (http://www.graphicsmagick.org) or LyX (http://www.lyx.org) to run the programs.
Meditation 1: small multiple line plots
Let’s start by constructing a small multiple line plot, where we plot the FMHPI for each state in a separate small plot, sort of like a sparkline. As we have 50 states, we’ll exclude D.C. and the US from this visualization so that we have nice number to work with in the panel plot.
Depending on your monitor size, you might need to zoom out to see the figure all at once. However, you can compress the graphic to a fairly small size, and still get the main points:
By comparing the red dot and dotted line to recent history you can see which states are above their pre-recession peaks and at an all-time high.
Add some animation
Adding some animation might help us absorb the huge amount of information in the charts. What we’re going to do is take the same sparkline, and repeat it several times, building up each state through time. The code below constructs this graph. And you’ll see that the code is not much more than the code above.
This code simply loops through years and creates a version of the plot and then compiles it into an animated gif. I originally posted this gif to Twitter and they have a 5MB limit on gifs so I had to reduce the number of images, skipping 12 months at a time. You can modify the code to loop through each month to create the plot if you have no such space restriction.
Meditation 2: dot plot
The time series sparklines provide a lot of information. Perhaps too much, particularly for an animated gif.
Let’s also construct a dot plot. The idea is to plot a single month’s value of the index for each state as a dot on the horizontal x axis, and use the vertical axis to display each state. This way, you can compare the relative position of each state. We’ll also add some additional features.
First the code and plot:
Each dot corresponds to the house price index value for an individual state in March of 2016. I’ve also added a “tail” that is a line segment stretching from the rolling 12-month minimum to the rolling 12-month maximum. When the index is up (down) over the past year then the dot will be at the right (left) end of the line. The length of the line helps us compare relative movements in different states. The log scale is important here so that we’re comparing percentage changes in indices, which are meaningful, and not total change in index points, which are not meaningful.
I’ve also added a gradient color scale between red and blue that denotes the percentage change in house prices over the past year. This color is slightly redundant as the length of the segment almost perfectly correlates with this measure. Still, by scanning the chart the bluest dot and tail is the fastest growing, and the reddest will be the slowest (or most negative).
This chart is enhanced by adding animations I think. The time sequence of the gif allows us to trace the history of house prices. Watching the line segements increase and decrease in length gives you a sense of the trajectory of house prices for that state. Comparisons acrosss states are also easier in this chart.
There are some features of this chart (the changing meaning of the scales) that from a data science perspective are probably not recommended. Still, I like this chart. I think you really get a feeling of momentum that is so important in housing markets. The shifting colors make decoding the specific percent changes with color difficult, but the importance of colors is really to say which state is fastest and slowest. As a state comes on strong, they shift from red to blue.
That’s enough R for this post. I’ve got other meditations (we still haven’t used the metro data yet) to consider, like this one:
We’ll go over this one, and maybe something else, in my next post.