2017/06/20

R / dataviz / housing / house prices

LET US FOLLOW UP ON YESTERDAY’S POST with some more analysis of housing affordability.

Per usual, we’ll use R to generate the plots and I’ll share the code below.

Measuring affordability

First, let’s talk a little bit more about what we are seeing in the plots. What are we measuring? Affordability metrics are often based on market level summary statistics. In our case we are looking at various ways to measure housing costs at a market level.

In this analysis we hope that market-level statistics capture important trends. For more on looking at distributions see for example, this post.

In the plot’s we looked at yesterday) and will expand on today, we looked at metro-area estimates of median house prices and median incomes. Using assumptions about mortgage financing, the house prices can be turned into estimates of monthly mortgage payments.

Specifically the mortgage payment is calculated assuming that a household (earning the median household income in 2016) purchases the median priced home in that market financed with a mortgage with a 5% downpayment (so the mortgage amount is 95% of the median price), property tax of 1.25%, property insurance of 0.25% and mortgage insurance of 0.5%. The only assumption that varies across areas is the house purchase price and thus that drives the monthly payment differentials across areas.

Of course, the market-level statistics don’t capture what’s true at an individual level, but the hope is that movements over time and comparisons across space capture important trends.

For our purposes, we are going to take advantage of other’s hard work and use the statistics that have been conveniently provided by the Joint Center for Housing Studies (JCHS) of Harvard University in their annual State of the Nation’s Housing report and their online appendix with a handy Excel table.

Just like yesterday, we’ll proceed assuming you’ve downloaded the Excel file from JCHS (available here .xlsx file).

Now with that in hand, let’s get back to it.

Data prep

Our data prep follow along like yesterday, though at the end we’ll merge our two datasets corresponding to information from the two worksheets we extract.

We’ll have to use readxl (see this post for more on wrangling Excel data with R) to get the data ready to use. It’s going to involve understanding the structure of the Excel file and using some of readxl’s nice features.

The spreadsheet is pretty well organized so using the patterns described in the links above we can pretty easily get the data ready.

We are going to use two of the worksheets. Worksheet W-9 contains estimates of metro area median monthly mortgage payments and worksheet W-12 contains estimates of metro area median house price-to-median household income estimates. After we’ve tidied the data we’ll merge the two tidy data into one data frame.

Let’s go get that data and wrangle it into shape:

library(readxl)
library(tidyverse)

## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr

## Conflicts with tidy packages ----------------------------------------------

## filter(): dplyr, stats
## lag():    dplyr, stats

library(ggbeeswarm)
library(viridis)

## Loading required package: viridisLite

library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

# Load the mortgage payment data
df<-read_excel(path="data/all_son_2017_tables_current_6_12_17.xlsx",
               sheet="W-9",
               range="A5:AB387")

# gather the data to tidy
df<-df %>% gather(year,pmt,c(2:28))
# rename the metro column from 'Metropolitan Area', to 'metro'
colnames(df)[1]<-"metro"
# convert year to numeric
df$year<-as.numeric(df$year)


# Load the median house price to median income estimates

df2<-read_excel(path="data/all_son_2017_tables_current_6_12_17.xlsx",
               sheet="W-12",
               range="A5:AB387")

# gather the data to tidy
df2<-df2 %>% gather(year,pti,c(2:28))
# rename the metro column from 'Metropolitan Area', to 'metro'
colnames(df2)[1]<-"metro"
# convert year to numeric
df2$year<-as.numeric(df2$year)

# NEW 06/20/2017
# merge the data together
df3<-merge(df,df2,by=c("year","metro"))

Make some plots

Yesterday we looked separately at the distribution of monthly mortgage payments across metro areas and years (in 2016 inflation-adjusted dollars) and the median house price-to-median household income ratio across metro areas and years.

But how do they line up? Do areas with high mortgage payments (driven by higher median house prices) also have higher house price-to-income ratios? Let’s take a look.

ggplot(data=filter(df3,metro !="United States" & year==2016), aes(x=pmt,y=pti,label=metro))+
  geom_point(alpha=0.75,size=3)+
  scale_x_log10(breaks=c(500,1000,1500,2000,3000,6000),labels=scales::dollar)+
  theme_minimal()+
  theme(plot.caption=element_text(hjust=0),
        plot.title=element_text(hjust=0,face="bold",size=24),
        legend.position="bottom",
        legend.key.width = unit(2, "cm"))+
  geom_point(size=3,color="red",data=filter(df3,metro %in% c("Youngstown-Warren-Boardman, OH-PA","San Jose-Sunnyvale-Santa Clara, CA") & year==2016))+
  geom_text(size=3,color="red",hjust=0,data=filter(df3,metro %in% c("Youngstown-Warren-Boardman, OH-PA") & year==2016),aes(y=pti-0.25))+
    geom_text(size=3,color="red",hjust=1,data=filter(df3,metro %in% c("San Jose-Sunnyvale-Santa Clara, CA") & year==2016),aes(y=pti-0.25))+

  labs(y="Median Home Price-to-Median Income Ratio",x="Monthly payment in 2016 $, log scale",title="How affordable is housing?",
       subtitle="Metro Area-Median Home Price-to-Median Income Ratios: 2016\nvs U.S.-Monthly Mortgage Payment on the Median Priced Home: 2016",
       caption="@lenkiefer Each dot represents one metro area.\n\nSources: JCHS tabulations of National Association of Realtors and Moody's Analytics Forecasted Single-Family Seasonally \nAdjusted Quarterly Median Home Price, annualized by DataBuffet; Freddie Mac Annual Primary Mortgage Market Survey.\nJCHS 2017 State of the Nation's Housing, Appendix Tables http://www.jchs.harvard.edu/research/state_nations_housing\n\nNote: Monthly payment assumes a 5% down payment on the median-priced existing single-family home with property taxes of 1.25%, \nproperty insurance of 0.25%, and mortgage insurance of 0.5%. Values are adjusted to 2016 dollars using CPI-U All Items Less Shelter.\nMedian household income in 2016 is forecasted by Moody's Analytics.")

This chart tells an interesting story. The chart is upward sloping, which indicates that metro areas where house prices and thus monthly payments are higher tend to have higher median home price-to-median household income ratios. For example, Youngstown-Warren-Boardman, OH-PA which has a monthly payment of about $500 while San Jose-Sunnyvale-Santa Clara, CA has a monthly payment of over $6,000 dollars. The median home price-to-median household income ratio is less than 2 in Youngstown-Warren-Boardman, OH-PA, while it is 9.5 in San Jose-Sunnyvale-Santa Clara, CA.

Trends over time

How have these statistics varied over time? Let’s plot one of my favorite visualizations for evolving relationships (no not a dual axis graph, a connected scatter plot.

ggplot(data=filter(df3,metro %in% c("Youngstown-Warren-Boardman, OH-PA","San Jose-Sunnyvale-Santa Clara, CA") ), 
       aes(x=pmt,y=pti,color=metro,label=paste(year)))+
  geom_path(size=0.25,alpha=0.5)+
  geom_point(alpha=0.75,size=1)+
  scale_x_log10(breaks=c(500,1000,1500,2000,3000,6000),labels=scales::dollar)+
  theme_minimal()+
  theme(plot.caption=element_text(hjust=0),
        plot.title=element_text(hjust=0,face="bold",size=24),
        legend.position="bottom",
        legend.key.width = unit(2, "cm"))+
  geom_text(size=6, fontface="bold",
            data=filter(df3,year %in% c(1990,2016)&
                          metro %in% 
                          c("Youngstown-Warren-Boardman, OH-PA",
                            "San Jose-Sunnyvale-Santa Clara, CA"))) +

  labs(y="Median Home Price-to-Median Income Ratio",x="Monthly payment in 2016 $, log scale",title="How affordable is housing?",
       subtitle="Metro Area-Median Home Price-to-Median Income Ratios: 1990-2016\nvs U.S.-Monthly Mortgage Payment on the Median Priced Home: 1990-2016",
       caption="@lenkiefer Each dot represents one metro area.\n\nSources: JCHS tabulations of National Association of Realtors and Moody's Analytics Forecasted Single-Family Seasonally \nAdjusted Quarterly Median Home Price, annualized by DataBuffet; Freddie Mac Annual Primary Mortgage Market Survey.\nJCHS 2017 State of the Nation's Housing, Appendix Tables http://www.jchs.harvard.edu/research/state_nations_housing\n\nNote: Monthly payment assumes a 5% down payment on the median-priced existing single-family home with property taxes of 1.25%, \nproperty insurance of 0.25%, and mortgage insurance of 0.5%. Values are adjusted to 2016 dollars using CPI-U All Items Less Shelter.\nMedian household income in 2016 is forecasted by Moody's Analytics.")

These two charts trace the fortunes of the two markets. Youngstown-Warren-Boardman, OH-PA has seen little change in the price-to-income ratio over the years. It has seen a decline in the monthly payment (after adjusting for inflation). San Jose-Sunnyvale-Santa Clara, CA on the other hand has seen a dramatic increase in both the monthly payment and the median home price-to-median income ratio. Despite the declines in mortgage rates, buyers in San Jose-Sunnyvale-Santa Clara, CA are paying more in payments and a larger percentage of their income (if they earn the median household income).

The story might be easier to see in an animated gif:

Follow-up

There’s a lot more interesting data compiled in the JCHS report. If you are interested in U.S. housing market trends, then you definitely should check out the full report (LINK pdf). The JCHS has also prepared some interesting interactive maps see for example here that have a bunch of interesting information.

We can also build some more interesting visualizations with the data. There’s a lot more left to explore. See you next time.

More on housing affordability

Measuring affordability

Data prep

Make some plots

Trends over time

Follow-up

Share!