2017/10/28

economy / forecasting / R / statistics

BACK WE GO INTO THE VASTY DEEP. LAST TIME we introduced the idea of using dynamic model averaging to forecast recessions. I was so excited about the new approach that I didn’t take the time to break down what was going on with it. In this post we’ll look more closely at what’s happening with the dma packaged when we try to forecast recessions.

Per usual we’ll do it with R and I’ll include code so you can follow along.

Background

In this section we’ll take a closer look at what’s going on with dynamic model averaging.

Literature

Still not including a full lit review, but see these paper for background (1 new addition since last time).

See this FEDS Notes on “Which market indicators best forecast recessions”
See this paper Online Prediction Under Model Uncertainty via Dynamic Model Averaging: Application to a Cold Rolling Mill
See this paper Dynamic Logistic Regression and Dynamic Model Averaging for Binary Classification
See this paper Forecasting Inflation Using Dynamic Model Averaging

Dynamic Model Averaging

In this post we’re going to focus on forecasting a binary outcome. In our case, we’re interested in knowing wheter or not the United States is in a recession. Specifically whether the National Bureau of Economic Research (NBER) has declared that a month $t$ falls in a recession. See here for NBER business cycle dates.

We will make our forecast at horizon $h$ which is either 0 or 12 (for now). That is, we want to nowcast whether current conditions indicate a recession and then see if we can forecast 1 year ahead. The case of $h=0$ isn’t trivial, because typically the NBER only declares a recession start after the fact. For example, though the Great Recession was dated to begin in December 2007, the announcement wasn’t made until December 2008. The NBER announcement dates since 1980 are here pdf.

Statistical model(s)

Let’s presume that the probability of the economy being in a recessions $h$ periods from now is given by:

\[ \text Logit({P}\ (y_t=1\ | \ L_{t}=k,x_{t-h})) = x^{(k)'}_{t-h}\theta^{(k)}_{t-h} \ \ \ \ \ (1)\]

where $\theta^{(k)}_{t-h}$ is a (possibly) time varying parameter vector and $x^{(k)'}_{t-h}$ is a vector of input variables. $L_{t}=k$ indicates that at time $t$ the probability is given by model indexed to $k$. We suppose that the are $K$ possible models that could drive $y$. The dimension of $x$ may vary across $k$.

Let’s start by assuming that $K=1$ and $\theta^{(k)}_{t-h}=\boldsymbol{\bar{\theta}^{(k)}}$ a constant. Also, we’ll assume that the parameter vector is a scalar (a single variable with no intercept).

In this case we can estimate our model with a simple logistic regression (like we did last time). Let’s just take a look at how the likelihood of a recession varies for different values of $x$ and different $\boldsymbol{\bar{\theta}}$.

library(tidyverse)
library(boot)      # has useful inv.logit function
mytheta <- c(-2,-1,0,1,2) #Possible values of theta 


df.m<-expand.grid(x=seq(-3,3,.01),mytheta=mytheta) %>% 
  mutate(y=inv.logit(mytheta*x))

ggplot(data=df.m, aes(x=x,y=y,color=factor(mytheta), group=mytheta  ))+
  geom_line(size=1.05)+
  #geom_text(data=filter(df.m,x==max(df.m$x)),size=5,fontface="bold")+
  scale_color_discrete(name=expression(theta))+
  theme_minimal()+theme(legend.position="top")+
  labs(title=expression(paste("Probability Y=1 given different values of x and ",theta)),
       subtitle=expression(paste("Logit(P(Y = 1 | x)) = x'",theta)),
       y="P(Y = 1 | x)")

We have a simple S curve generated by the logit that says if $\theta$ is positive (negative) the likelihood of $y$ is increasing (decreasing) in $x$. We might imagine that the shape of the curve may vary over time, and we might also think that the $x$ variables relevant for determining the shape might also vary.

Let’s imagine that there are two possible $x$’s that might help predict $y$. (In our application we consider payroll employment growth $PAYEMS$ and the slope of the yield curve $SLOPE$.) But for now let’s just call them $x_{1}$ and $x_{2}$. At any point in time, one of two models is relevant for predicting $y$, one involving $x_{1}$ or one with $x_{2}$:

$Logit(P(y_{t}=1|x,L=1))= x_{1t}^{'}\theta_{t}^{(1)}$

and

$Logit(P(y_{t}=1|x,L=2))=x_{2t}^{'}\theta_{t}^{(2)}$.

Notice that I’ve added the $t$ subscript back to the $\theta$s. Parameters might be unstable and drift over time. A simple model of drift would be:

\[ \theta_{t}^{(k)}=\theta_{t-1}^{(k)}+\delta^{(k)}_{t},\ \ \ \delta^{(k)}_{t} \sim N(0,\sigma^{2}) \ \ \ \ \ (2) \]

There are various ways to estimate time varying parameter equations like (2) above. The Dynamic Model Averaging (DMA) approach uses forgetting factors. Check out Raferty et. al (2010) and McCormick et. al (2012) for details. Briefly, the DMA approach takes a standard Kalman filter for Time Varying Parameters but substitutes use a forgetting factor $\lambda \leq1$ into the prediction equation’s variance matrix, effectively imposing a window size of $\frac{1}{1-\lambda}$. (Note this is similar to the rolling regression approach we used last time, but with current data weighted more heavily than older data).

We can estimate both models conditionally (assuming that they are each in effect each period). But then how to combine the estimates?

Both models are plausible and we aren’t sure which one is correct. One approach is to use Bayesian Model Averaging (BMA) or other ensemble methods to combine forecasts. Dynamic model averaging (DMA) is one such method.

DMA maintains the assumption that the model probabilities evolve according to:

\[ P(L_{t}=k | Y^{t-1})= \frac{P(L_{t-1}=k | Y^{t-1})^{\alpha} }{\sum_{j=1}^{K}P(L_{t-1}=j | Y^{t-1})^{\alpha}} \ \ \ \ \ \ (3)\] Equation (3) above is critically important for DMA. Instead ofspecifying a full $K \times K$ transition Matrix P for model probabilities, probabilities are updated using forgetting, parameterized by $\alpha\leq1$. In the special case where $\alpha=1$ and $\lambda=1$ the $\theta$’s are constant DMA becomes BMA.

There are several advantages of DMA. One is that it allows the weights on plausible models to shift over time. Another nice feature is that it is computationally efficient. We don’t have to do complicated and potentially expensive algorithms like Markov Chain Monte Carlo. Instead, we can arrive at answers analytically. That means it is fast and even my puny laptop can run it.

Forecasting recessions

Let’s go back to our example of forecasting recessions. Remember that we went to the Saint Louis Federal Reserve’s FRED database and pulled down four variables needed to construct our two indicators and the recession indicator. We have transformed the variables and start with a data frame df3. Code for constructing this is here.

knitr::kable(rbind(head(df3),tail(df3)))

date	DSPIC96	GS10	GS2	PAYEMS	PERMIT	TB3MS	USREC	SLOPE	CURVE	REC12
1945-01-01	NA	NA	NA	NA	NA	0.38	0	NA	NA	0
1945-02-01	NA	NA	NA	NA	NA	0.38	0	NA	NA	0
1945-03-01	NA	NA	NA	NA	NA	0.38	1	NA	NA	0
1945-04-01	NA	NA	NA	-1.0836098	NA	0.38	1	NA	NA	0
1945-05-01	NA	NA	NA	-1.4318442	NA	0.38	1	NA	NA	0
1945-06-01	NA	NA	NA	-1.5479950	NA	0.38	1	NA	NA	0
2017-04-01	0.9107180	2.30	1.24	0.3359878	-5.5384615	0.80	0	1.50	-0.62	NA
2017-05-01	1.0511115	2.30	1.30	0.2757712	-4.1837572	0.89	0	1.41	-0.59	NA
2017-06-01	0.4748264	2.19	1.34	0.3853987	1.1904762	0.98	0	1.21	-0.49	NA
2017-07-01	0.5713209	2.32	1.37	0.3376019	0.1628664	1.07	0	1.25	-0.65	NA
2017-08-01	-0.0257702	2.21	1.34	0.3536857	8.9041096	1.01	0	1.20	-0.54	NA
2017-09-01	NA	2.20	1.38	0.1871776	-3.9215686	1.03	0	1.17	-0.47	NA

With data in hand we can proceed to estimate our model.

Let’s estimate for $h=0$. First, we need to filter out data where we have missing observations. We’ll start our data in 1956 and require that we have observations on the Recession indicator $USREC$.

# 
df5 <- filter(df3, year(date)>1955 & !is.na(USREC))

# convert data to matrix
xvar <- as.matrix(df5 %>% select(SLOPE,PAYEMS))
yvar <- as.matrix(df5$USREC)
# design for models
mmat<- matrix(c(1,0,
                0,1,
                1,1),3,2,byrow=TRUE)
print(mmat)

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
## [3,]    1    1

The matrix $mmat$ is important because it tells dma what kind of models we want to fit. The rows correspond to the $K$ models we ask dma to fit (here 3) and the columns correspond to the variables for each model. A 1 (0) indicates that the variable is present (not present) in the model. So model 1 uses only the $SLOPE$, model 2 uses only $PAYEMS$ and model 3 uses both variables. We can fit the model via:

dma.fit0<- logistic.dma(unname(xvar), yvar, mmat, 
                        lambda=0.99, 
                        alpha=0.99,
                       autotune=FALSE, # allow alpha to vary over time?
                       initialsamp=120 # set initial sample to 10 years (used for initial values)
                       )

Following the literature we set lambda = 0.99. Remember our effective window size is $\frac{\lambda}{1-\lambda}$, or 99 months using monthly data and $\lambda=0.99$. If we set lambda = 0.95 the effective window size would shrink to just 19 months. Such a small value of lambda would only respond to very recent changes in the data.

While the dma package is easy to use, the output isn’t exactly tidy. Let’s take a look at the structure with

str(dma.fit0)

## List of 13
##  $ x            : num [1:741, 1:2] 0.49 0.52 0.71 0.58 0.46 0.51 0.8 0.73 0.54 0.44 ...
##  $ y            : num [1:741, 1] 0 0 0 0 0 0 0 0 0 0 ...
##  $ models       : num [1:3, 1:2] 1 0 1 0 1 1
##  $ lambda       : num 0.99
##  $ alpha        : num 0.99
##  $ pmp          : num [1:3, 1:741] 0.333 0.333 0.333 0.333 0.333 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:3] "model_1" "model_2" "model_3"
##   .. ..$ : chr [1:741] "time_1" "time_2" "time_3" "time_4" ...
##  $ alpha.used   : num [1:741] 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 ...
##  $ theta        : num [1:3, 1:741, 1:3] -3.18 -1.88 -2.22 -3.18 -1.88 ...
##  $ vartheta     : num [1:3, 1:741, 1:3] 0.446 0.211 0.835 0.446 0.211 ...
##  $ yhatdma      : Named num [1:741] 0.0276 0.0282 0.037 0.0347 0.0338 ...
##   ..- attr(*, "names")= chr [1:741] "time_1" "time_2" "time_3" "time_4" ...
##  $ yhatmodel    : num [1:3, 1:741] 0.07601 0.00364 0.00318 0.079 0.00298 ...
##  $ fitted.values: Named num [1:741] 0.0276 0.0282 0.037 0.0347 0.0338 ...
##   ..- attr(*, "names")= chr [1:741] "time_1" "time_2" "time_3" "time_4" ...
##  $ residuals    : num [1:741, 1] -0.0276 -0.0282 -0.037 -0.0347 -0.0338 ...
##  - attr(*, "class")= chr "logistic.dma"

Okay, we have lists of lists and arrays. In dma.fit0$theta we have a 3x741x3 array of parameter estimates. If we select the first row we get results for model 1 etc. Let’s extract the parameter estimates for each variable and their variances which are contained in dma.fit0$vartheta.

In order to tidy things up we’ll have to write a couple functions and rearrange things.

# function to get coefficients out
myf<-function(j=1,                                     # model we extract
              indf=dma.fit0,                           # dma output list
              in.date=df5$date,                        # dates 
              varnames=c("INTERCEPT","SLOPE","PAYEMS") # list of variable names
              ){
  out=data.frame(indf$theta[j,,]) # convert array to data frame
  colnames(out)<-varnames         # rename columns
  out$date<-in.date              # dates 
  out$model<-paste0("Model:",j)   # add model label
  return(out)            }
# function to get cofficient variances
myfv<-function(j=1,                                     # model we extract
              indf=dma.fit0,                           # dma output list
              varnames=c("INTERCEPTv","SLOPEv","PAYEMSv") # list of variable names (v inciates variance)

){
  out=data.frame(indf$vartheta[j,,]) # convert array to data frame
  colnames(out)<-varnames            # rename columns
  out$model<-paste0("Model:",j)     # add model label
  return(out)
              }

#use purrr::map to stack up results

data.frame(modeln=1:nrow(mmat)) %>% 
  mutate(theta  = map(modeln,myf),
         thetav = map(modeln,myfv)) %>%
  unnest(theta,thetav) %>% 
  select(model,date,c("INTERCEPT","SLOPE","PAYEMS"),
         c("INTERCEPTv","SLOPEv","PAYEMSv")) -> df.cv

# get coefficients

df.cv %>% select(model,date,c("INTERCEPT","SLOPE","PAYEMS")) %>%
  gather(var,est,-date,-model) -> df.coef

# get variances
df.cv %>% select(model,date,c("INTERCEPTv","SLOPEv","PAYEMSv")) %>%
  gather(var,v,-date,-model) %>%
  mutate(var=substr(var,1,nchar(var)-1)) -> df.var

# merge back together
df.plot<-merge(df.coef,df.var,by=c("model","date","var"))

We did all this business so we can construct a nice panel plot like so:

# plot, but exclude first 15 years
ggplot(data=filter(df.plot,date>min(df.plot$date)+years(15)),
       aes(x=date,y=est,color=var, fill=var))+
  geom_hline(yintercept=0,color="black")+
  geom_ribbon(aes(ymin=est-2*v, ymax=est+2*v),alpha=.1, color=NA)+
  geom_ribbon(aes(ymin=est-1*v, ymax=est+1*v),alpha=.25, color=NA)+
  geom_line(aes(y=est-2*v),linetype=2)+
  geom_line(aes(y=est+2*v),linetype=2)+
  geom_line()+
  guides(color=F,fill=F)+
  theme_minimal()+
  facet_grid(var~model,scales="free_y")+
  labs(x="date",y=expression(paste("Coefficient estimate",theta[t-h])),
       title="Dynamic Logistic Regression (h=0)",
       subtitle=expression(paste("Logit(P(Y = 1 | Model=k,x)) = ",x[t-h],"'",theta[t-h])),
        caption="@lenkiefer Dynamic logistic regression coefficients for three models.\nSolid line coefficient estimates, shaded area +/- 1 & 2 standard errors.\nModel 1: Yield curve slope (10-year minus 3-month U.S. Treasury yield)\nModel 2: Employment change (3 month %)\nModel 3: Yield curve slope and employment change\nModel fit with dma: Tyler H. McCormick, Adrian Raftery and David Madigan (2017).\ndma: Dynamic Model Averaging. R package version 1.3-0. https://CRAN.R-project.org/package=dma")+
  theme(plot.caption=element_text(hjust=0),
        plot.subtitle=element_text(face="italic",size=9),
        plot.title=element_text(face="bold",size=14))

How well do each of these three models fit the data? The DMA routine includes an estimate of the model probabilites contained in dma.fit0$pmp. Let’s plot it:

# plot model probabilities
df.plot2<-data.frame(t(dma.fit0$pmp), date=df5$date)

ggplot(data=gather(df.plot2,Model,p,-date), aes(x=date,y=p,color=Model))+
  geom_line()+theme_minimal()+
  theme(plot.caption=element_text(hjust=0),legend.position="top",
        plot.subtitle=element_text(face="italic",size=9),
        plot.title=element_text(face="bold",size=14))+
    labs(x="date",y="Model Probabilties",
       title="Dynamic Logistic Regression (h=0)",
       subtitle="Model Probabilities",
        caption="@lenkiefer \nModel 1: Yield curve slope (10-year minus 3-month U.S. Treasury yield)\nModel 2: Employment change (3 month %)\nModel 3: Yield curve slope and employment change\nModel fit with dma: Tyler H. McCormick, Adrian Raftery and David Madigan (2017).\ndma: Dynamic Model Averaging. R package version 1.3-0. https://CRAN.R-project.org/package=dma")

In this case, the model selection algorithm clearly favors model_3 that uses both variables.

How about for $h=12$?

df50 <- df5 %>% mutate(yhat0=dma.fit0$yhatdma)
df5 <- filter(df3, year(date)>1955 & !is.na(REC12))

xvar <- as.matrix(df5 %>% select(SLOPE,PAYEMS))
yvar <- as.matrix(df5$REC12)

mmat<- matrix(c(1,0,
                0,1,
                1,1),3,2,byrow=TRUE)

dma.fit12<- logistic.dma(unname(xvar), yvar, mmat, lambda=0.99, alpha=0.99,
                       autotune=TRUE, initialsamp=120)

And stack up the results as before.

#use purrr::map to stack up results
df12 <- filter(df3, year(date)>1955 & !is.na(REC12))
data.frame(modeln=1:nrow(mmat)) %>% 
  mutate(theta  = map(modeln,myf,indf=dma.fit12,in.date=df12$date),
         thetav = map(modeln,myfv,indf=dma.fit12)) %>%
  unnest(theta,thetav) %>% 
  select(model,date,c("INTERCEPT","SLOPE","PAYEMS"),
         c("INTERCEPTv","SLOPEv","PAYEMSv")) -> df.cv12

# get coefficients

df.cv12 %>% select(model,date,c("INTERCEPT","SLOPE","PAYEMS")) %>%
  gather(var,est,-date,-model) -> df.coef12

# get variances
df.cv12 %>% select(model,date,c("INTERCEPTv","SLOPEv","PAYEMSv")) %>%
  gather(var,v,-date,-model) %>%
  mutate(var=substr(var,1,nchar(var)-1)) -> df.var12

# merge back together
df.plot12<-merge(df.coef12,df.var12,by=c("model","date","var"))

# plot, but exclude first 15 years
ggplot(data=filter(df.plot12,date>min(df.plot12$date)+years(15)),
       aes(x=date,y=est,color=var, fill=var))+
  geom_hline(yintercept=0,color="black")+
  geom_ribbon(aes(ymin=est-2*v, ymax=est+2*v),alpha=.1, color=NA)+
  geom_ribbon(aes(ymin=est-1*v, ymax=est+1*v),alpha=.25, color=NA)+
  geom_line(aes(y=est-2*v),linetype=2)+
  geom_line(aes(y=est+2*v),linetype=2)+
  geom_line()+
  guides(color=F,fill=F)+
  theme_minimal()+
  facet_grid(var~model,scales="free_y")+
  labs(x="date",y=expression(paste("Coefficient estimate",theta[t-h])),
       title="Dynamic Logistic Regression (h=12)",
       subtitle=expression(paste("Logit(P(Y = 1 | Model=k,x)) = ",x[t-h],"'",theta[t-h])),
        caption="@lenkiefer Dynamic logistic regression coefficients for three models.\nSolid line coefficient estimates, shaded area +/- 1 & 2 standard errors.\nModel 1: Yield curve slope (10-year minus 3-month U.S. Treasury yield)\nModel 2: Employment change (3 month %)\nModel 3: Yield curve slope and employment change\nModel fit with dma: Tyler H. McCormick, Adrian Raftery and David Madigan (2017).\ndma: Dynamic Model Averaging. R package version 1.3-0. https://CRAN.R-project.org/package=dma")+
  theme(plot.caption=element_text(hjust=0),
        plot.subtitle=element_text(face="italic",size=9),
        plot.title=element_text(face="bold",size=14))

With $h=12$ $PAYEMS$ doesn’t have much predictive power. The $SLOPE$ of the yield curve retains some predictive power even a year ahead.

How do the model weight shift? Let’s take a look.

# plot model probabilities
df.plot12.2<-data.frame(t(dma.fit12$pmp), date=df12$date)

ggplot(data=gather(df.plot12.2,Model,p,-date), aes(x=date,y=p,color=Model))+
  geom_line()+theme_minimal()+
  theme(plot.caption=element_text(hjust=0),legend.position="top",
        plot.subtitle=element_text(face="italic",size=9),
        plot.title=element_text(face="bold",size=14))+
    labs(x="date",y="Model Probabilties",
       title="Dynamic Logistic Regression (h=12)",
       subtitle="Model Probabilities",
        caption="@lenkiefer \nModel 1: Yield curve slope (10-year minus 3-month U.S. Treasury yield)\nModel 2: Employment change (3 month %)\nModel 3: Yield curve slope and employment change\nModel fit with dma: Tyler H. McCormick, Adrian Raftery and David Madigan (2017).\ndma: Dynamic Model Averaging. R package version 1.3-0. https://CRAN.R-project.org/package=dma")

In this case we see that model_1 that only uses $SLOPE$ seems to to the best.

Finally, we can combine the models to come up with predictions for $h=0$ and $h=12$ by taking a weighted average of the models:

Discussion

In this post we reviewed in more detail how to use the dma package to forecast recessions using dynamic logistic regression and dynamic model averaging. This approach allows the parameters of models to adjust over time and have the model probabilities also evolve. This example was relatively simple, but we can easily scale the number of models and number of parameters. In a follow up post, we’ll extend this analysis to include more predictors and look at forecasts over different horizons.

A closer look at forecasting recessions with dynamic model averaging

Back we go into the vasty deep. In this post we look closer at using dynamic model averaging and dynamic logistic regression to forecast recessions.