13 August 2016

OVER THE PAST THREE MONTHS I HAVE MADE several new house price visualizations. In these meditations I’ll consider some recent graphs and provide R code for them. For reference, prior meditations are available at:

# Meditation 1: Median sales price trends

Earlier this week, the National Association of Realtors (NAR) released their quarterly update on metro area median house prices (data here). These data show trends in the median sales price of existing single family homes.

According to the NAR report, the national median sales price of an existing single family home was \$240,700 in the second quarter of 2016, but there are vast differences in house prices across the country. San Jose made news by having a median sales prices that was over \$1 million dollars. On the other end of the spectrum, the median sales price in Youngstown, Ohio was just over \$85,000.

In order to better understand the distribution of median sales prices, I constructed the following combination chart:

The chart combines a slopegraph showing the trend in house prices from the second quarter of 2015 to the second quarter of 2016 and two histogram of house prices where the bars are replaced with dots representing each individual metro. By looking at the slopes on the left, you can see how house prices have trended across metros (mostly up) and by looking at the histogram on the right, you can compare how individual metros stack up relative to other markets in the country. On the left I’ve created two histograms comparing 2015 Q2 to 2016Q2 so you can see how metros have moved in the median price distribution over time.

As there is a whole lot of data (180 metros), I use animation to highlight each individual metro one at a time. I sorted the metros based on their place in the 2016 Q2 price distribution, starting from Youngstown and going up to San Jose.

## Code for plot

In order to construct this plot using R we’ll need to combine multiple graphs on a single page. Fortunately, the Cookbook for R has code for this. By using the multiplot function in the link above, we can easily combine two plots into one page.

The data from NAR comes in a spreadsheet, but we’re going to have to change it. From this:

to this:

I tried to manipulate the data in R, but the headers and formatting of the spreadsheet made it seem more trouble than it was worth, so I transformed the data using excel. See this post for details.

### Slopegraph

The slopegraph is a line plot with two points. In this case, the median price in 2015Q2 and in 2016Q2. We’ll arrange the dates on the x axis and have the price displayed on the y axis. We’ll use a logarithmic scale for the y axis, so the slopes of the lines will approximate the percentage change in the median price.

We’ll call out the United State median sales price by overlaying a red line on tope of gray lines representing each individual metro area. In the animation we’ll loop through each metro and update the histogram.

### Create the histograms

In order to create the histograms we’re going to have build them ourselves. Our strategy will be to place each metro in a bin corresponding to a range of house prices (say from \$75,000 to \$100,000), and then count up how many metros are in each bin. So far, that’s just like a standard histogram. But as we’re going to draw a dot for each metro area, we have to assign a y axis value for each metro.

### Combine the plots

Now we’ll used the multiplot function to combine the plots. We’ll also make another histogram so we can compare the distribution of median house sales prices in 2016Q2 to the distribution in 2015Q2.

### add animation

Now we want to highlight each individual metro area. To do so, we’ll construct an animated gif where we highlight each metro one at time. The code below generates the animated gif:

Run it and you get our plot:

# Meditation 2: Changes in the Distribution of House Price Appreciation

The NAR data only goes back to 2015Q2, but how has the metro level distribution of house prices changed over the last 10 years or so? In this section we’ll consider a graph I constructed using the Freddie Mac House Price Index (FMHPI), which is available to the public on Freddie Mac’s webpage and goes back to the 1970s for over 300 metro areas.

The data I’m going to use is an updated version of the files fmhpi2.txt I described in Part 1: data wrangling .

For the animation we’ll also be using the tweenr package, which I’ve written about before. See my earlier post about tweenr for an introduction, and more examples here and here.

## Adding animation

We want to compare how the distribution of annual house price growth has shifted from 2006 to 2016. We’ll compare the annual appreciation in June of each year. We’ll also use tweenr to have the dots smoothly transition between years.

## Adding a time series plot, highlighting individual metros

We can use multiplot again to combine the historgram with a line plot. In this case, we’ll loop through all the metro areas and compare the current year-over-year appreciation to the history of that metro from 2000 through 2016 (June).

Coder for this plot follows:

Check out the other visual meditations on house prices