LET US FOLLOW UP ON YESTERDAY’S POST with some more analysis of housing affordability.

Per usual, we’ll use R to generate the plots and I’ll share the code below.

# Measuring affordability

First, let’s talk a little bit more about what we are seeing in the plots. What are we measuring? Affordability metrics are often based on market level summary statistics. In our case we are looking at various ways to measure housing costs at a market level.

In this analysis we hope that market-level statistics capture important trends. For more on looking at distributions see for example, this post.

In the plot’s we looked at yesterday) and will expand on today, we looked at metro-area estimates of median house prices and median incomes. Using assumptions about mortgage financing, the house prices can be turned into estimates of monthly mortgage payments.

Specifically the mortgage payment is calculated assuming that a household (earning the median household income in 2016) purchases the median priced home in that market financed with a mortgage with a 5% downpayment (so the mortgage amount is 95% of the median price), property tax of 1.25%, property insurance of 0.25% and mortgage insurance of 0.5%. The only assumption that varies across areas is the house purchase price and thus that drives the monthly payment differentials across areas.

Of course, the market-level statistics don’t capture what’s true at an individual level, but the hope is that movements over time and comparisons across space capture important trends.

For our purposes, we are going to take advantage of other’s hard work and use the statistics that have been conveniently provided by the Joint Center for Housing Studies (JCHS) of Harvard University in their annual State of the Nation’s Housing report and their online appendix with a handy Excel table.

Just like yesterday, we’ll proceed assuming you’ve downloaded the Excel file from JCHS (available here .xlsx file).

Now with that in hand, let’s get back to it.

# Data prep

Our data prep follow along like yesterday, though at the end we’ll merge our two datasets corresponding to information from the two worksheets we extract.

We’ll have to use *readxl* (see this post for more on wrangling Excel data with R) to get the data ready to use. It’s going to involve understanding the structure of the Excel file and using some of readxl’s nice features.

The spreadsheet is pretty well organized so using the patterns described in the links above we can pretty easily get the data ready.

We are going to use two of the worksheets. Worksheet W-9 contains estimates of metro area median monthly mortgage payments and worksheet W-12 contains estimates of metro area median house price-to-median household income estimates. After we’ve tidied the data we’ll merge the two tidy data into one data frame.

Let’s go get that data and wrangle it into shape:

# Make some plots

Yesterday we looked separately at the distribution of monthly mortgage payments across metro areas and years (in 2016 inflation-adjusted dollars) and the median house price-to-median household income ratio across metro areas and years.

But how do they line up? Do areas with high mortgage payments (driven by higher median house prices) also have higher house price-to-income ratios? Let’s take a look.

This chart tells an interesting story. The chart is upward sloping, which indicates that metro areas where house prices and thus monthly payments are higher tend to have higher median home price-to-median household income ratios. For example, Youngstown-Warren-Boardman, OH-PA which has a monthly payment of about $500 while San Jose-Sunnyvale-Santa Clara, CA has a monthly payment of over $6,000 dollars. The median home price-to-median household income ratio is less than 2 in Youngstown-Warren-Boardman, OH-PA, while it is 9.5 in San Jose-Sunnyvale-Santa Clara, CA.

## Trends over time

How have these statistics varied over time? Let’s plot one of my favorite visualizations for evolving relationships (no not a dual axis graph, a connected scatter plot.

These two charts trace the fortunes of the two markets. Youngstown-Warren-Boardman, OH-PA has seen little change in the price-to-income ratio over the years. It has seen a decline in the monthly payment (after adjusting for inflation). San Jose-Sunnyvale-Santa Clara, CA on the other hand has seen a dramatic increase in both the monthly payment and the median home price-to-median income ratio. Despite the declines in mortgage rates, buyers in San Jose-Sunnyvale-Santa Clara, CA are paying more in payments and a larger percentage of their income (if they earn the median household income).

The story might be easier to see in an animated gif:

# Follow-up

There’s a lot more interesting data compiled in the JCHS report. If you are interested in U.S. housing market trends, then you definitely should check out the full report (LINK pdf). The JCHS has also prepared some interesting interactive maps see for example here that have a bunch of interesting information.

We can also build some more interesting visualizations with the data. There’s a lot more left to explore. See you next time.