Tour of U.S. metro area house price trends

Let's take a tour of recent house price trends in the United States. We will build an awesome visualization with R and then export it to PowerPoint. From PowerPoint we will make a sweet video.

HEY! HERE IS A VIDEO SHOWING HOUSE PRICE TRENDS around the United States.

Earlier this year we looked at how to get the data and plot it using R.

I made the video using the PowerPoint to .mp4 workflow I outlined here.

Below I’ll review how to build this file.

Get data

We are going to use house price data from the publicly available Freddie Mac House Price Index.

In order to make our map, we’ll want to merge on the latitude and longitude of the principal city for each metro.

We are going to use the us.cities data that comes with the maps library. To get the cbsa locations, we need to merge on the principal city of each metro area to the us.cities data. The us.cities file has the latitude and longitude of many U.S. cities.

The U.S. Census Bureau has convenient files here that will allows us to map U.S. cities to metro areas. We can grab a mapping of principal cities to CBSA and merge to the us.cities data. I’ve also added metro population (in 2015), which will be useful for sorting later.

In summary, we’ll need these two files:

Prep house price data

The house price data come in a spreadsheet with data in two worksheets. We’ll use readxl to munge the data and get it ready to go.

################################################
## Step Load libraries ----
# (just realized four dashes "----"
#  helps with reading files in Rstudio)
#################################################
library(tidyverse)
library(viridis)
library(data.table)
library(readxl)
library(cowplot)
library(ggthemes)
library(scales)
library(maps)
library(albersusa)
data(us.cities) # from the package maps

###############################################################################
#### Load metro data ----
#### note i've renamed the spreadsheet to msas_new17q3.xls to indicate 2017 Q3
###############################################################################
df2<-read_excel("data/msas_new17q3.xls", 
                sheet = "MSA Indices A-L",
                range="B6:HG519" )
# Create  a date
df2$date<-seq.Date(as.Date("1975-01-01"),as.Date("2017-09-01"),by="1 month")

df3<-read_excel("data/msas_new17q3.xls", 
                sheet = "MSA Indices M-Z",
                range="B6:FM519" )
df3$date<-seq.Date(as.Date("1975-01-01"),as.Date("2017-09-01"),by="1 month")

###############################################################################
#### merge worksheets ----
###############################################################################

dm <- left_join(df2,df3,by="date") %>% gather(geo,hpi,-date) %>% mutate(type="metro") %>%
  mutate(state=substr(geo,nchar(geo)-1,nchar(geo))) %>%
  group_by(geo) %>% 
  mutate(hpa12=hpi/lag(hpi,12)-1,
         month=month(date),
         year=year(date)) %>% ungroup() %>% rename(metro=geo) %>% 
  data.table() 

Now we can combine the metro data with the geo data.

################################################
## load cbsa data ----
################################################

cbsa.data <-fread("data/cbsa.city.txt")
cbsa.metro<-cbsa.data[metro.micro=="Metropolitan Statistical Area"]
#create lowercase names
cbsa.metro[,nameL:=tolower(name)]
us.cities<-data.table(us.cities)[,nameL:=tolower(name)]

d<-merge(cbsa.metro,us.cities,by="nameL")
#get rid of duplicates
# see: http://stackoverflow.com/questions/15776064/r-first-observation-by-group-using-data-table-self-join
d<-d[order(-pop)]
d<-d[d[,list(row1 = .I[1]), by = list(cbsa)][,row1]]

################################################
## merge house price and geo data ----
################################################
dm2<-merge(dm,d,by.y="cbsa.name",by.x="metro",all.x=T)

# Subset house price price data to just large metros
df<-subset(dm2,metro==d[order(-metro.pop)]$cbsa.name[1] & year>1999)

# could maybe avoid this with simple features, but 
# i'm just going to roll with this for the maps now
us <- usa_composite()
us_map <- fortify(us, region="name")

And then…

################################################
## Function for subsetting data ----
################################################

myf<-function(i){
  # i indicates ith largest metro (by population) per 2015 Census estimates
  dt<-subset(dm2,metro==d[order(-metro.pop)]$cbsa.name[i] & year>1999)
  dt %>% map_if(is.character, as.factor) %>% as.data.frame -> dt.out
  return(dt.out)
}


################################################
## Function for plotting data ----
################################################

myplot<-function(df){
  g.map<-
    ggplot(df, aes(x = long, y = lat)) +
    borders("state",  colour = "grey70",fill="lightgray",alpha=0.5)+
    theme_void()+
    theme(legend.position="none",
          plot.title=element_text(face="bold",size=18,hjust=0))+
    geom_point(alpha=0.82,color="black",size=3)+
    labs(title="House price trends around the U.S.",
         subtitle=head(df,1)$metro,
         caption="@lenkiefer Source: Freddie Mac House Price Index through September 2017\n")+
    theme(plot.caption=element_text(hjust=0))

  g.line<-
    ggplot(data=df,aes(x=date,y=hpi))+geom_line()+
    scale_y_log10(limits=c(60,260),breaks=c(75,100,125,seq(150,300,50)))+
    theme_minimal()+
    labs(x="",y="",
         title="House Price Index, NSA",
         subtitle="Dec 2000 = 100, log scale")+
    geom_hline(data=tail(df,1),aes(yintercept=hpi),linetype=2)+
    geom_point(data=tail(df,1),color="black",size=2.5,alpha=0.82)
  
  g.bar<-
    ggplot(data=df,aes(x=date,y=hpa12,fill=hpa12))+geom_col()+
    theme_minimal()+
    scale_y_continuous(label=percent,limits=c(-.45,.45),breaks=seq(-.45,.45,.15))+
    scale_fill_viridis(option="B")+
    geom_text(data=filter(df,date==max(df$date)), hjust=1,fontface="bold",
                          aes(y=0.3,label=paste0("Sep 2017:",percent(round(hpa12,3)))))+
    labs(x="",y="",
         title="House Price Appreciation",
         subtitle="year-over-year percent change in index")+
    theme(plot.caption=element_text(hjust=0),
          legend.position="none")
  
  # use nested cowplot::plot_grid to arrand plots
  g<-plot_grid(g.map+theme(legend.position="none",
                           plot.title=element_text(face="bold",size=18,hjust=0)),
               plot_grid(g.line,g.bar),ncol=1)
  return(g)
}

Finally, we can loop through the top 50 metro areas and write one PowerPoint Slide per metro.

################################################
## Load Officr libraries ----
#################################################
library(officer)
library(rvg)

################################################
## Write PowerPoint ----
#################################################
# Load blank.pptx, an empty powerpoint that serves as a template
my_pres<-read_pptx("data/blank.pptx")

# function for adding slides
myp<- function(i){
  my_pres %>% 
    add_slide(layout = "Blank", master = "Office Theme") %>%
    ph_with_vg_at( code=print(myplot(myf(i))) , 1.38, 0.1, 7.25,7.3) ->
    my_pres
}

# use purrr::walk() to write the files
walk(1:50,myp)

# save the .pptx file
my_pres %>%
  print( target = "hpi50.pptx") %>% 
  invisible()

You can download the deck here. The images in the PowerPoint are vector graphics so the file is big. Below I’ve embedded a pdf version of the slidedeck.

Making a movie

Finally, if you want to create a video you can follow the steps outlined here to convert the PowerPoint into an mp4 format.

 Share!