2017/10/26

economy / forecasting / R / statistics

HERE THE LITERATURE IS VASTY DEEP. In this post we’ll dip our toes, every so slightly, into the dark waters of macroeconometric forecasting. I’ve been studying some techniques and want to try them out. I’m still at the learning and exploring stage, but let’s do it together.

In this post we’ll conduct an exercise in forecasting U.S. recessions using several approaches. Per usual we’ll do it with R and I’ll include code so you can follow along.

Longer than usual disclaimer

I’m not recommending any of these techniques and don’t guarantee any results if you try them. The exercises here are merely for learning and don’t represent my views (or the views of my employer) about the likelihood of recessions. The results have not gone through peer review, I haven’t fully reviewed the techniques, code and these results aren’t necessarily suitable for any purposes.

Background

Literature

We’re not going to review the literature in any detail. A large portion of modern macroeconomics studies business cycles, and there’s a huge literature on forecasting techniques. But if you’re interested in background I’ll provide three important links directly related to today’s investigation. Later if things go great, I’ll have time to write up some more thoughts on these papers.

See this FEDS Notes on “Which market indicators best forecast recessions”
See this paper Online Prediction Under Model Uncertainty via Dynamic Model Averaging: Application to Cold Rolling Mill
See this paper Dynamic Logistic Regression and Dynamic Model Averaging for Binary Classification

Data

All the data we’ll need will be available through the Saint Louis Federal Reserve’s FRED database. See here for more discussion of using FRED with R.

R packages

We’ll rely on tidyquant and tibbletime packages for data munging and the dma package for estimation using the dynamic model averaging technique discussed below.

Let’s get to it!

A very simple model

Let’s follow the basic setup outline in this FEDS note. We want to forecast whether or not the National Bureau of Economic Research (NBER) has declared that a month t falls in a recession. See here for NBER business cycle dates.

We follow their framework (with a slight modification swapping a Logit for a Probit). Let \(Y_t=1\) indicate that month \(t\) falls in a recession.

\[ \text Logit({Pr}\ (Y_t=1\ | \ x_{t-h})) = x^{(k)'}_{t-h}\theta^{(k)}_{t-h} \ \ \ \ \ (1)\]

where \(\theta^{(k)}_{t-h}\) is a (possibly) time varying parameter vector and \(x^{(k)'}_{t-h}\) is a vector of input variables. Forecasts are made \(h\) periods ahead and the dimension of \(x\) may vary across \(k\).

We could (and will in a future post) consider several indicators to help predict a recession and also consider several different forecast horizons \(h\). In this post we’ll focus on just 2 indicators and a 2 horizons.

Indicators

We’ll focus on two indicators and see how well they individually or in combination might predict recessions. We’ll use monthly observations on the 3-month percent change in nonfarm payroll employment PAYEMS (adopting the FRED mnemonic) and the slope of the U.S. Treasury Yield Curve SLOPE measured by the the percentage difference in the 10-year constant maturity Treasury yield and the 3-mont Treasury bill rate.

Within the vasty deep of macroeconometric forecasting literature, employment growth has been a reliable concurrent indicator for recessions while the yield curve is one of the few variables that seems to have some predictive power for recessions, as we shall see.

Getting data, plotting trends

The variables we’ll need for today are conveniently available in FRED with the following Mnemonics:

USREC gives us a monthly indicator for NBER recessions (1= month is a recession, 0 not a recession)
PAYEMS measures U.S. nonfarm employment
GS10 gives monthly 10-year Treasury Yields
TB3MS gives us monthly 3-month Treasury Tbill rates

We’ll have to get the data and do some transformations, but it’s easy with R and tidyquant.

#####################################################################################
## Step 0: Load Libraries ##
#####################################################################################
library(tidyquant)
library(dma)
library(tibbletime)
library(boot)
library(viridis)
library(cowplot)

#####################################################################################
## Step 1: Get data  ##
#####################################################################################
tickers<- c("USREC",  # NBER Recession indicator
            "GS10",   # 10-year constant maturity Treasury yield
            "TB3MS",  # 3-month Tbill rate
            "PAYEMS"  # nonfarm payroll employment

            )
df <- tq_get(tickers,get="economic.data",from="1945-01-01")
# recession df (for plotting) 
# see: https://www.r-bloggers.com/use-geom_rect-to-add-recession-bars-to-your-time-series-plots-rstats-ggplot/

recessions.df = read.table(textConnection(
  "Peak, Trough
  1957-08-01, 1958-04-01
  1960-04-01, 1961-02-01
  1969-12-01, 1970-11-01
  1973-11-01, 1975-03-01
  1980-01-01, 1980-07-01
  1981-07-01, 1982-11-01
  1990-07-01, 1991-03-01
  2001-03-01, 2001-11-01
  2007-12-01, 2009-06-01"), sep=',',
colClasses=c('Date', 'Date'), header=TRUE)


#####################################################################################
## Step 2: Prep ##
#####################################################################################

#transformation

my_trans <- function(in.data,transform="pctdiff3"){
  switch(transform,
         logdiff  = c(NA,diff(log(in.data))),          # not used here
         pctdiff3 = 100*Delt(in.data,k=3),
         logdiff3 = c(rep(NA,3),diff(log(in.data),3))) # not used here
}

vlist<- c("PAYEMS")

df2<- df %>% group_by(symbol) %>%
  mutate(x=ifelse(symbol %in% vlist, my_trans(price),price))

df2 %>% select(-price)  %>% spread(symbol,x) %>%
  mutate(
    # SLOPE variable
    SLOPE=GS10-TB3MS,
    # 12-month Lead for recessions
         REC12=lead(USREC,12)) -> df3              
# Trim the data
df4 <- filter(df3, year(date)>1955 & year(date)<2016)

Now that we have our data, let’s take a look at our two indicators.

# employment plot
g.emp<-
ggplot(data=df4, aes(x=date,y=PAYEMS))+
  geom_rect(data=recessions.df, inherit.aes=FALSE,
            aes(xmin=Peak, xmax=Trough, ymin=-Inf, ymax=+Inf), 
            fill='lightblue', alpha=0.5)+theme_minimal()+geom_line(color="royalblue")+
  labs(x="",y="",title="Employment growth",
       subtitle="3-month % change",
       caption="@lenkiefer Source: U.S. Bureau of Labor Statistics, Federal Reserve (H.15), shaded area NBER recessions.\nretrieved from FRED, Federal Reserve Bank of St. Louis, October 26, 2017.")+
  geom_hline(yintercept=0,color="black")+
  theme(plot.caption=element_text(hjust=0),
        plot.subtitle=element_text(face="italic",size=9),
        plot.title=element_text(face="bold",size=14))
# yield curve plot
g.slope<-
  ggplot(data=df4, aes(x=date,y=SLOPE))+
  geom_rect(data=recessions.df, inherit.aes=FALSE,
            aes(xmin=Peak, xmax=Trough, ymin=-Inf, ymax=+Inf), 
            fill='lightblue', alpha=0.5)+theme_minimal()+geom_line(color="royalblue")+
  labs(x="",y="",title="Yield curve slope",
       subtitle="10-year minus 3-month U.S. Treasury rates",
       caption="")+
  geom_hline(yintercept=0,color="black")+
  theme(plot.caption=element_text(hjust=0),
        plot.subtitle=element_text(face="italic",size=9),
        plot.title=element_text(face="bold",size=14))
# combined plot (uses cowplot::plot_grid)
plot_grid(g.slope,g.emp,ncol=1)

These two plots contain the entirety of our problem. We want to use the lines to predict whether or not we are presently in a recession h=0 or if in 12 months we will be in one h=12.

Simple logistic regression

Let’s assume the parameter \(\theta\) remains constant and there their is a single composite model that consists of both PAYEMS and SLOPE. We an estimate it with a logistic regression (we’ll use stargazer to format our output).

# forecast contemporaneously (h=0)
glm.h0<- glm(USREC ~ SLOPE + PAYEMS,family=binomial(link='logit'), data=df4)
# forecast 12 months ahead (h=12)
glm.h1<- glm(REC12 ~ SLOPE + PAYEMS,family=binomial(link='logit'), data=df4)

stargazer::stargazer(glm.h0,glm.h1, title="Forecasting Recessions", type="html",
                     dep.var.caption  = "Recession probability",
                     dep.var.labels   = "Forecast horizon",
                     column.labels = c("contemporaneous (h=0)", "forecast (h=12)"))

**Forecasting Recessions**

	Recession probability

	Forecast horizon	REC12
	contemporaneous (h=0)	forecast (h=12)
	(1)	(2)

SLOPE	-0.964^***	-1.350^***
	(0.155)	(0.143)

PAYEMS	-5.370^***	-0.046
	(0.529)	(0.231)

Constant	-0.072	-0.528^***
	(0.230)	(0.196)


Observations	720	720
Log Likelihood	-127.587	-217.054
Akaike Inf. Crit.	261.173	440.108

Note:	p<0.1; p<0.05; p<0.01

tibbltime for rolling regressions

This regression assumes that the parameters are constant over time. But they might change due to regime changes or other sources of parameter instatiblity. One way to deal with instability is to run rolling window regressions. Those are super easy thanks to tibbletime.

Let’s estimate 20-year rolling windows and plot the coefficients over time. Here we’ll look at the models for 12-months ahead. See this tibbletime vignette for details on these steps.

# compute a rolling regression

# rolling regressgion with rollify
rolling_lm <- rollify(.f = function(REC12 , SLOPE , PAYEMS) {
  glm(REC12 ~ SLOPE + PAYEMS,family=binomial(link='logit'))
}, 
window = 240, 
unlist = FALSE)


df4.tt <- as_tbl_time(df4,index=date) #covert to tibbletime

df4.tt %>% mutate(roll_lm=rolling_lm(REC12 , SLOPE , PAYEMS)) %>%
  filter(!is.na(roll_lm)) %>%
  mutate(tidied = purrr::map(roll_lm, broom::tidy)) %>%
  unnest(tidied) %>%
  select(date, term, estimate, std.error, statistic, p.value) -> df4.reg

# plot coefficients
ggplot(data=df4.reg, aes(x=date,y=estimate,color=term))+
  geom_line(size=1.05,color="royalblue")+
  theme_minimal()+
  facet_wrap(~term)+
labs(y="Coefficient",x="", 
       title="Model for U.S. recession probabilities\n240-month rolling regressions",
       subtitle="12 months ahead",
       caption="@lenkiefer Recession probabilities based on a rolling regression.\nSLOPE: Yield curve slope (10-year minus 3-month U.S. Treasury yield)\nPAYEMS Employment change (3 month %)\nModel: logistic regression for USREC12 = PAYEMS + SLOPE \nModel fit with glm & tibbletime:   Davis Vaughan and Matt Dancho (2017).\ntibbletime: Time Aware Tibbles. R package version 0.0.2.
  https://CRAN.R-project.org/package=tibbletime")+
  geom_hline(yintercept=0,color="black")+
  theme(plot.caption=element_text(hjust=0),
        plot.subtitle=element_text(face="italic",size=9),
        plot.title=element_text(face="bold",size=14))

These estimates show quite a large amount of apparent parameter instability. Coefficient signs even switch. And the probabilities jump. Can we do better?

Dynamic model averaging

Oh yes we can.

We’d like to have a model that would allow parameters to vary over time. We’d like to discount the past, but not as abruptly as the rolling regressions do.

Turns out the Dynamic Modeling Average approach of McCormick, Raftery and Madigan (dma) is perfect for this exercise.

I hope to explore this in more detail in future, but let’s load it up and try it out. The package is good, but I don’t like the default plotting, so I’ll do some manipulations to get our results tidy and ready for ggplot2.

# fit binary model (contemporaneous)
df5 <- filter(df3, year(date)>1955 & !is.na(USREC))

# convert data to matrix
xvar <- as.matrix(df5 %>% select(SLOPE,PAYEMS))
yvar <- as.matrix(df5$USREC)

# design for models
mmat<- matrix(c(1,0,
                0,1,
                1,1),3,2,byrow=TRUE)

# fit model h= 0
dma.fit0<- logistic.dma(unname(xvar), yvar, mmat, lambda=0.99, alpha=0.99,
                       autotune=TRUE, initialsamp=120)

# repeat for h=12


df50 <- df5 %>% mutate(yhat0=dma.fit0$yhatdma)
df5 <- filter(df3, year(date)>1955 & !is.na(REC12))

xvar <- as.matrix(df5 %>% select(SLOPE,PAYEMS))
yvar <- as.matrix(df5$REC12)

mmat<- matrix(c(1,0,
                0,1,
                1,1),3,2,byrow=TRUE)

dma.fit12<- logistic.dma(unname(xvar), yvar, mmat, lambda=0.99, alpha=0.99,
                       autotune=TRUE, initialsamp=120)

df5 <- df5 %>% mutate(yhat12=dma.fit12$yhatdma)

# combine results
df6<-full_join(df5,df50 %>% select(date,yhat0), by="date")

Then we can plot results:

g.12<-
ggplot(data=df6, aes(x=date+months(12),y=yhat12))+
  geom_rect(data=recessions.df, inherit.aes=FALSE,
            aes(xmin=Peak, xmax=Trough, ymin=-Inf, ymax=+Inf), 
            fill='lightblue', alpha=0.5)+theme_minimal()+
  geom_line(color="royalblue",size=1.05)+
  scale_y_continuous(labels=scales::percent)+
  labs(x="",y="recession probability",
       title="Estimated U.S. recession probabilities given by dynamic model averaging",
       subtitle="12 months ahead",
       caption="@lenkiefer Recession probabilities based on dynamic model averaging of three models\nforecasting recession 0 and 12-months ahead.\nModel 1: Yield curve slope (10-year minus 3-month U.S. Treasury yield)\nModel 2: Employment change (3 month %)\nModel 3: Yield curve slope and employment change\nModel fit with dma: Tyler H. McCormick, Adrian Raftery and David Madigan (2017).\ndma: Dynamic Model Averaging. R package version 1.3-0. https://CRAN.R-project.org/package=dma")+
  geom_hline(yintercept=0,color="black")+
  theme(plot.caption=element_text(hjust=0),
        plot.subtitle=element_text(face="italic",size=9),
        plot.title=element_text(face="bold",size=14))
g.0<-
  ggplot(data=df6, aes(x=date,y=yhat0))+
  geom_rect(data=recessions.df, inherit.aes=FALSE,
            aes(xmin=Peak, xmax=Trough, ymin=-Inf, ymax=+Inf), 
            fill='lightblue', alpha=0.5)+theme_minimal()+
  geom_line(color="red",size=1.05)+
  scale_y_continuous(labels=scales::percent)+
  labs(x="",y="recession probability",
       title="Estimated U.S. recession probabilities given by dynamic model averaging",
       subtitle="0 months ahead",
       caption="@lenkiefer Recession probabilities based on dynamic model averaging of three models\nforecasting recession 0 and 12-months ahead.\nModel 1: Yield curve slope (10-year minus 3-month U.S. Treasury yield)\nModel 2: Employment change (3 month %)\nModel 3: Yield curve slope and employment change\nModel fit with dma: Tyler H. McCormick, Adrian Raftery and David Madigan (2017).\ndma: Dynamic Model Averaging. R package version 1.3-0. https://CRAN.R-project.org/package=dma")+
  geom_hline(yintercept=0,color="black")+
  theme(plot.caption=element_text(hjust=0),
        plot.subtitle=element_text(face="italic",size=9),
        plot.title=element_text(face="bold",size=14))

g.rec<-
plot_grid(g.0+labs(caption=""),g.12+labs(title=""),
          ncol=1)
g.rec

The plot above shows how well the model predicts recessions contemporaneously (pretty good) and ahead of time (not so well). We might want to add more variables to the model and see how they look. We’ll leave that for a later time.

Discussion

There’s a lot more to discuss about this approach. I’m still chewing on it, but as I discover more I’ll post more here. In a follow-up post, we’ll dig into the internals of the dma package and try to see exactly what’s going on. See you next time.

Forecasting recessions with dynamic model averaging

We go into the vasty deep, dipping our toes ever so slightly into the dark waters of macroeconometric forecasting. Here we use dynamic model averaging to forecast recessions with R.