2016/12/04

Introduction

I HAVE BEEN WATCHING SOME VIDEOS of Plotcon 2016. All of the videos I’ve watched are worth watching (check out the playlist), but I was particularly interested in this one from Hadley Wickham:

Among other things Hadley talks about the idea of nesting data frames, models and model results within a data frame. That idea struck me as something that could be quite useful and not at all something that could lead to explosive increase in the size of data frames or unending loops.

We’ll try these ideas out in a simple, stylized forecasting exercise. We’ll use R to explore and model.

The data

I happen to have some data handy that’s perfect for this exercise. It’s the Freddie Mac House Price Index I’ve been using for my series of Visual Meditations on House Prices.

We’re going to use data house prices for the United States, each of the 50 states and the District of Columbia collected in the following file:

state and national called fmhpi2016q3.txt

Important note: Though I’m going to use house prices as an illustrative example, this shouldn’t be interpreted as my recommendation for a reasonable way to model house prices in any way. This is just for fun, and trying out some coding things.

The strategy

Here’s what we’re going to do. We’re going to take our house price data, a dataset with monthly observations on the house price index from January 1975 to September 2016, and construct a simple forecast for house price growth (year-over-year percent change) for each state rolling forward from 1985.

To make things simple we’ll subset the data to just include observations in September of each (the last month available for 2016). That will reduce the number of observations and get rid of overlapping time series observations.

We can apply the techniques outlined in Hadley’s Plotcon talk to organize our results in a single data frame.

The old way

What I would usually do in this situation, is construct a series of loops to iterate over each state and each time period. Something like this (where forecast.function, and stack.data.function are some functions that compute forecasts and organize the output data respectively):

for (i: 1:N.states){                                                    # iterate over states
  for (t: 1:T.months){                                                  # iterate over time periods
    newdata<-filter(data, date<=T & state==i)                           # filter data, state=i, date <= T
    forecast.data<-forecast.function(newdata)                           # forecast function
    output.data<-stack.data.function(output.data,forecast.data)         # organize data function
    }
  }

Nest with map

Now we can avoid the loops by using the map or map2 functions from the purrr package. Instead of loops, we can take a dataframe with a list of states and dates and then use the map2 function to store our model results in the same dataframe. Like so:

output.data<- data %>% mutate(mod=map2(state,date,forecast.function))

Besides being more efficient to write, this will result in a nice structure and we don’t have to worry about index numbers and aligning things if we want to add or subtract rows.

Example using house prices

Let’s load packages, import the data from the text file above, and do some data manipulations:

#load libraries
library(data.table, warn.conflicts = FALSE, quietly=TRUE)
library(ggplot2, warn.conflicts = FALSE, quietly=TRUE)
library(dplyr, warn.conflicts = FALSE, quietly=TRUE)
library(zoo, warn.conflicts = FALSE, quietly=TRUE)
library(ggrepel, warn.conflicts = FALSE, quietly=TRUE)
library(ggthemes, warn.conflicts = FALSE, quietly=TRUE)
library(scales, warn.conflicts = FALSE, quietly=TRUE)
library(tidyr, warn.conflicts = FALSE, quietly=TRUE)
library(zoo,warn.conflicts=F,quietly=T)
library(purrr,warn.conflicts=F,quietly=T)
library(xts,warn.conflicts=F,quietly=T)
library(lubridate,warn.conflicts=F,quietly=T)
library(viridis,warn.conflicts = F,quietly = F) #for the colorz

#load data from text file
d.state <- fread("data/fmhpi2016q3.txt")
#set up date variable
d.state$date<-as.Date(d.state$date, format="%m/%d/%Y")
#get list of states
state.list<-unique(d.state$state)

# construct variable hpa12 which is 12-month percentage change in house price index
# using data.table for convenience of rolling operations across groups
d.state[,hpa12:=c(rep(NA,12),((1+diff(hpi,12)/hpi))^1)-1,by=state]

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 1).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 2).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 3).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 4).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 5).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 6).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 7).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 8).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 9).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 10).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 11).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 12).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 13).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 14).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 15).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 16).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 17).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 18).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 19).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 20).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 21).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 22).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 23).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 24).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 25).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 26).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 27).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 28).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 29).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 30).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 31).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 32).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 33).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 34).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 35).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 36).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 37).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 38).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 39).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 40).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 41).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 42).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 43).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 44).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 45).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 46).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 47).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 48).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 49).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 50).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 51).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 52).
## The last 12 element(s) will be discarded.

## Warning in diff(hpi, 12)/hpi: longer object length is not a multiple of
## shorter object length

## Warning in `[.data.table`(d.state, , `:=`(hpa12, c(rep(NA, 12), ((1 +
## diff(hpi, : RHS 1 is length 513 (greater than the size (501) of group 53).
## The last 12 element(s) will be discarded.

#subset so that year >1975 and month ==9
d.state<-d.state[month==9 & year>1975]

Now that we have our data loaded let’s take a peek at the structure:

library("htmlTable")
# make tables for viewing formatting dates with purr %>% operations
htmlTable(head(d.state %>% map_if(is.Date, as.character,format="%b-%Y") %>% map_if(is.numeric, round,3) %>%as.data.frame() ,10), col.rgroup = c("none", "#F7F7F7"),caption="State House Price Data",
          tfoot="Source: Freddie Mac House Price Index")

	date	state	hpi	year	month	type	hpa12
State House Price Data
1	Sep-1976	AK	41.953	1976	9	State	0.077
2	Sep-1976	AL	38.775	1976	9	State	0.073
3	Sep-1976	AR	41.717	1976	9	State	0.084
4	Sep-1976	AZ	30.269	1976	9	State	0.039
5	Sep-1976	CA	19.974	1976	9	State	0.162
6	Sep-1976	CO	22.027	1976	9	State	0.066
7	Sep-1976	CT	27.238	1976	9	State	0.06
8	Sep-1976	DC	22.244	1976	9	State	0.101
9	Sep-1976	DE	28.837	1976	9	State	0.011
10	Sep-1976	FL	34.051	1976	9	State	0.036
Source: Freddie Mac House Price Index

What we want to do is forecast the house price appreciation (“hpa12”: year-over-year percent change in index) at different points in time.

Working on a single state

Before we get into the nesting business, let’s just do it for a single state, for a single month. Let’s extract the history of house price growth (hereafter HPA) in Virginia from 1976 through 2016 in September of each year:

# just plot data for VA

ggplot(data=d.state[state=="VA",],aes(x=date,y=hpa12,label=paste(percent(hpa12))))+
  geom_line(color=viridis(10)[1],size=1.1)+
  geom_point(data=tail(d.state[state=="VA",],1),color=viridis(10)[8],alpha=0.82,size=3)+
  geom_text(data=tail(d.state[state=="VA",],1),color=viridis(10)[8],alpha=0.82,size=3,hjust=0,vjust=-1)+
  scale_y_continuous(label=percent)+
  coord_cartesian(xlim=c(as.Date("1976-01-01"),as.Date("2017-12-31")))+
  theme_minimal()+labs(x="",y="",title="Annual percentage change in house prices",
                       subtitle="Virginia in September",
                       caption="@lenkiefer Source: Freddie Mac House Price Index")+
  theme(plot.caption=element_text(hjust=0))

House prices in Virginia are growing by just under 3 percent year-over-year in September 2016, up from the lows of 2009, but below the middle part of last decade and also a bit below the long-run average.

Let’s construct a simple times series forecasting model for house prices (see my note above, this is just for illustration) based onthe history of HPA. There are many models we could use, but one of the simplest is an Autoregressive Model.We can code this pretty easily in R, but working with dates can be tricky. I’ve tried a couple things, but xts works for me.

We use a simple autoregressive model fit to up to two lags of HPA.

  df<-d.state[state=="VA",]                       #just get data for VA
  va.ts<-xts(df$hpa12,df$date)                    #create xts time series of hpa12 with data
  va.out<-ar.ols(va.ts,order.max=2)               #construct ar model
  va.out

## 
## Call:
## ar.ols(x = va.ts, order.max = 2)
## 
## Coefficients:
##       1        2  
##  0.9019  -0.2672  
## 
## Intercept: -0.0009794 (0.00611) 
## 
## Order selected 2  sigma^2 estimated as  0.001456

There’s quite a bit of persistence in the HPA series, reflected in the coefficients. We might be concerned about stationarity with this series. We can use some nice functions from Rob Hyndman to check:

source("code/roots.R")  #load AR roots
# see http://robjhyndman.com/hyndsight/arma-roots/ 
# for code
plot.armaroots(arroots(va.out))

Now we can use this simple AR model to forecast house prices. The code below will stack the predictions to the input data. Then we’ll make a simple forecast plot:

  va.p<-predict(va.out, n.ahead = 4)              #forecasts from ar model
  va.p.ts<-xts(va.p$pred,
                 seq(max(df$date)+years(1),
                     max(df$date)+years(4),"1 year"))
  va.ts<-rbind(va.ts,va.p.ts)
  f.data<-data.frame(date=index(va.ts), 
                     hpa12=coredata(va.ts))
    f.data$f<-ifelse(f.data$date>max(df$date),"forecast","actual")

    #plot forecast:
    ggplot(data=f.data,aes(x=date,y=hpa12,label=paste(percent(hpa12)),color=f,linetype=f))+
      geom_line(size=1.1)+
      scale_color_viridis(name="",discrete=T,end=0.75,direction=1)+
      geom_point(data=tail(f.data,1),color=viridis(10)[8],alpha=0.82,size=3)+
      geom_text(data=tail(f.data,1),color=viridis(10)[8],alpha=0.82,size=3,hjust=0,vjust=-1)+
      scale_y_continuous(label=percent)+
      coord_cartesian(xlim=c(as.Date("1976-01-01"),as.Date("2021-12-31")))+
      theme_minimal()+labs(x="",y="",title="Annual percentage change in house prices",
                           subtitle="Virginia in September, dotted line forecast from AR(2) model",
                           caption="@lenkiefer Source: Freddie Mac House Price Index")+
      guides(linetype=F)+
      theme(plot.caption=element_text(hjust=0),
            legend.position="top")

Okay things look reasonable. These forecasts are sort of dumb, they don’t account for inflation or other factors, but based on the history of house prices they aren’t totally outlandish. What’s a little more interesting is to consider if we rolled back the clock, what these simple forecasts would have looked like.

Again, just using Virginia as an example, we can go back for each year since 1985, fit a forecasting model on history up to that point in time, and then project forward a few years. We could do it with a loop, but instead we’ll use Hadley’s approach described in Plotcon and store the results of each estimate as a dataframe nested inside a data frame. This structure will be more compact, and also make plotting with ggplot quite simple, no explicit loops involved.

Build a function

To get this to work we’re going to require a function with two inputs. First, it will take the name of state and second it will take a maximum date. Then it will construct the forecast for that state up to that time (as we did for Virginia above) and stack the forecasts.

fcst.d3<-function(s,dmax) { 
  #subset data based on state s and up to date dmax:
  df<-d.state[state==s & date<=dmax,c("state","date","hpa12"),with=F]
  test.ts<-xts(df$hpa12,df$date)
  test.out<-ar(test.ts,order.max=2)
  test.p<-predict(test.out, n.ahead = 4)
  test.p.ts<-xts(test.p$pred,seq(max(df$date)+years(1),max(df$date)+years(4),"1 year"))
  test.ts<-rbind(tail(test.ts,1),test.p.ts)
  f.data<-data.frame(state=s,date=index(test.ts), hpa12=coredata(test.ts))
  f.data$f<-ifelse(f.data$date>max(df$date),"forecast","actual")
  return(f.data)
}

Let’s use this function by adding a forecast based on data up to September 2005 to our original plot for Virginia:

# Examine forecast output
htmlTable(fcst.d3("VA","2005-09-01") %>% map_if(is.Date, as.character,format="%b-%Y") %>% map_if(is.numeric, round,3) %>%as.data.frame() , col.rgroup = c("none", "#F7F7F7"),caption="Virginia house price forecasts\nfrom simple AR(2) model",
          tfoot="Source: Freddie Mac House Price Index\nAR(2) forecast fit on data through Sep 2005")

	state	date	hpa12	f
Virginia house price forecasts from simple AR(2) model
1	VA	Sep-2005	0.179	actual
2	VA	Sep-2006	0.14	forecast
3	VA	Sep-2007	0.114	forecast
4	VA	Sep-2008	0.097	forecast
5	VA	Sep-2009	0.086	forecast
Source: Freddie Mac House Price Index AR(2) forecast fit on data through Sep 2005

# Create plot

ggplot(data=d.state[state=="VA",],aes(x=date,y=hpa12,label=paste(percent(hpa12))))+
  geom_line(color=viridis(10)[1],size=1.1)+
  geom_line(data=fcst.d3("VA","2005-09-01"), #use forecast function data
            linetype=2,color=viridis(10)[4],size=1.1)+
  geom_point(data=tail(d.state[state=="VA",],1),color=viridis(10)[1],alpha=0.82,size=3)+
  geom_text(data=tail(d.state[state=="VA",],1),color=viridis(10)[1],alpha=0.82,size=3,hjust=0,vjust=-1)+
  scale_y_continuous(label=percent)+
  coord_cartesian(xlim=c(as.Date("1976-01-01"),as.Date("2017-12-31")))+
  theme_minimal()+labs(x="",y="",title="Annual percentage change in house prices",
                       subtitle="Virginia in September, dotted line forecast from AR(2) model fit on data through 2005",
                       caption="@lenkiefer Source: Freddie Mac House Price Index")+
  theme(plot.caption=element_text(hjust=0))

Here we can see that based on the history of hpa, a simple model would have expected some mean reversion back towards a long-run average, as depicted by the tentacle extending from the plot. What would it look like if we added a tentacle for each year? We could do that through a loop, or use the nesting described in Hadley’s talk. Let’s try that:

Nesting

The function below will allow us to next each prediction in our dataframe. For now, we’ll restrict the data just to Virginia, but soon we’ll add in all 50 states plus D.C. Turns out it just takes a couple lines of code!

fcsts.va<- d.state[year(date)>1984 & state=="VA", c("date","state","hpa12"),with=F] %>% 
           mutate(fcst=map2(state,date,fcst.d3))
head(fcsts.va,5)

##         date state      hpa12
## 1 1985-09-01    VA 0.05019297
## 2 1986-09-01    VA 0.06988367
## 3 1987-09-01    VA 0.10303571
## 4 1988-09-01    VA 0.09598973
## 5 1989-09-01    VA 0.07379588
##                                                                                                                                                                                              fcst
## 1 1, 1, 1, 1, 1, 5722, 6087, 6452, 6818, 7183, 0.0501929690046294, 0.0673057256424662, 0.0673057256424662, 0.0673057256424662, 0.0673057256424662, actual, forecast, forecast, forecast, forecast
## 2 1, 1, 1, 1, 1, 6087, 6452, 6818, 7183, 7548, 0.0698836679882848, 0.0675400840375406, 0.0675400840375406, 0.0675400840375406, 0.0675400840375406, actual, forecast, forecast, forecast, forecast
## 3  1, 1, 1, 1, 1, 6452, 6818, 7183, 7548, 7913, 0.103035712535516, 0.0704980530790385, 0.0704980530790385, 0.0704980530790385, 0.0704980530790385, actual, forecast, forecast, forecast, forecast
## 4 1, 1, 1, 1, 1, 6818, 7183, 7548, 7913, 8279, 0.0959897340392564, 0.0724589516144399, 0.0724589516144399, 0.0724589516144399, 0.0724589516144399, actual, forecast, forecast, forecast, forecast
## 5 1, 1, 1, 1, 1, 7183, 7548, 7913, 8279, 8644, 0.0737958791316216, 0.0725544464370957, 0.0725544464370957, 0.0725544464370957, 0.0725544464370957, actual, forecast, forecast, forecast, forecast

Now we have a data frame with a set of data frames (one for each date) corresponding to forecasts beginning at the date and extending 4 years.

Now we can easily construct our tentacle plot. We use the unnest() function to expand the data.

p.data<-unnest(fcsts.va,fcst,.sep=".") #unnest the data for plotting
ggplot()+  
  geom_path(aes(x=fcst.date,y=fcst.hpa12,color=factor(year(date))),
                     data=unnest(fcsts.va,fcst,.sep=".")
            ,linetype=2 )    +
  
      geom_path(aes(x=date,y=hpa12),
                data=d.state[year(date)>1984 & state=="VA", c("date","state","hpa12"),with=F]
                ,size=1.05)+
  theme_minimal()+theme(legend.position="none")+  scale_y_continuous(label=percent)+
  labs(x="",y="",title="Annual percentage change in house prices",
                       subtitle="Virginia in September, dotted lines forecast from AR(2) model fit",
                       caption="@lenkiefer Source: Freddie Mac House Price Index\nAR(2) forecast fit on data up to point where solid line and dotted lines diverge")+
  theme(plot.caption=element_text(hjust=0))+

  coord_cartesian(xlim=c(as.Date("1985-01-01"),max(p.data$fcst.date)),ylim=c(-.1,.2))

We can also roll over each state:

fcsts.st<- d.state[year(date)>1984, c("date","state","hpa12"),with=F] %>% 
           mutate(fcst=map2(state,date,fcst.d3))
           
p.data2<-unnest(fcsts.va,fcst,.sep=".")

ggplot()+  
  geom_path(aes(x=fcst.date,y=fcst.hpa12,color=factor(year(date))),
                     data=unnest(filter(fcsts.st,! state %in% c("DC","US.NSA","US.SA")),fcst,.sep=".")
            ,linetype=2,size=.75 )    +
  
      geom_path(aes(x=date,y=hpa12),
                data=d.state[year(date)>1984 & ! state %in% c("DC","US.NSA","US.SA") , c("date","state","hpa12"),with=F]
                ,size=.85)+
  theme_minimal()+theme(legend.position="none")+  scale_y_continuous(label=percent)+
  labs(x="",y="",title="Annual percentage change in house prices",
                       subtitle="Dotted lines forecast from AR(2) model fit",
                       caption="@lenkiefer Source: Freddie Mac House Price Index\nAR(2) forecast fit on data up to point where solid line and dotted lines diverge")+
  theme(plot.caption=element_text(hjust=0))+
  facet_wrap(~state,ncol=5)

And we can make a gif looping through a few key states:

tentacle gifs

Going forward

This type of approach could be quite useful for a number of other applications. The map functions from purrr have a lot of potential uses I’m looking forward to trying out in the future. So far, I haven’t blown up my computer with an endless loop. If I can keep it that way, I’ll follow up in this space with some other applications.

Nested recursion: Loops within loops within data frames