EPOP becomes Fedspeak

2019/07/09

R / textmining / economics / dataviz

If I cannot send Adam Ozimek (at Modeled Behavior ) a Diet Pepsi, then the next best thing might be a chart about epop. epop is the term economists use to describe the employment-to-population ratio, a useful summary statistic about the labor market. Perhaps the summary statistic. Adam (and others) has been talking about epop as a key labor market statistic for years. It seems the Federal Reserve is catching on to the usage of the term epop (though many economists over there have been looking at the statistic for a long while too).

In this post I want to update a post from last year: Text Mining Fedspeak where we used the tidytext package for R to textmine the annual monetary policy report. Updating my script for the July 2019 Monetary Policy Report. I’ve made a few tweaks to my script, so I’ll include them, per usual, hidden in the details tab.

Load Libraries, Wrangle Data

# load libraries ----
suppressPackageStartupMessages({
library(extrafont)
library(ggraph)
library(ggridges)
library(pdftools)
library(tidyverse)
library(tidytext)
library(forcats)
library(reshape2)
library(tidyr)
library(igraph)
library(widyr)
library(lubridate)
library(ggrepel)
library(viridis)}
)

# get data by reading in pdf versions of reports
# list of reports, comments indicate important events around release of report
fed.links=c("https://www.federalreserve.gov/publications/files/20190705_mprfullreport.pdf", # July 2019
            "https://www.federalreserve.gov/monetarypolicy/files/20180713_mprfullreport.pdf",  
            "https://www.federalreserve.gov/monetarypolicy/files/20170707_mprfullreport.pdf",
            "https://www.federalreserve.gov/monetarypolicy/files/20160621_mprfullreport.pdf",            # released in jun 2016, but we'll label it July
            "https://www.federalreserve.gov/monetarypolicy/files/20150715_mprfullreport.pdf",            # July 2015  ( before lift off)
            "https://www.federalreserve.gov/monetarypolicy/files/20140715_mprfullreport.pdf",
            "https://www.federalreserve.gov/monetarypolicy/files/20130717_mprfullreport.pdf",            # July 2013  ( after Taper Tantrum)
            "https://www.federalreserve.gov/monetarypolicy/files/20120717_mprfullreport.pdf",
            "https://www.federalreserve.gov/monetarypolicy/files/20110713_mprfullreport.pdf",            # July 2011  ( early recovery)
            "https://www.federalreserve.gov/monetarypolicy/files/20100721_mprfullreport.pdf",
            "https://www.federalreserve.gov/monetarypolicy/files/20090721_mprfullreport.pdf",            # July 2009  ( end of Great Recession)
            "https://www.federalreserve.gov/monetarypolicy/files/20080715_mprfullreport.pdf",
            "https://www.federalreserve.gov/monetarypolicy/files/20070718_mprfullreport.pdf" ,           # July 2007  ( eve of  Great Recession)
            "https://www.federalreserve.gov/boarddocs/hh/2006/july/fullreport.pdf",
            "https://www.federalreserve.gov/boarddocs/hh/2005/july/fullreport.pdf",                      # July 2005  ( housing boom)
            "https://www.federalreserve.gov/boarddocs/hh/2004/july/fullreport.pdf",
            "https://www.federalreserve.gov/boarddocs/hh/2003/july/FullReport.pdf" ,                     # July 2003  ( deflation fears)
            "https://www.federalreserve.gov/boarddocs/hh/2002/july/FullReport.pdf",
            "https://www.federalreserve.gov/boarddocs/hh/2001/july/FullReport.pdf",                      # July 2001  ( dot come Recession)
            "https://www.federalreserve.gov/boarddocs/hh/2000/July/FullReport.pdf",
            "https://www.federalreserve.gov/boarddocs/hh/1999/July/FullReport.pdf",                      # July 1999  ( eve of dotcom Recession)
            "https://www.federalreserve.gov/boarddocs/hh/1998/july/FullReport.pdf",
            "https://www.federalreserve.gov/boarddocs/hh/1997/july/FullReport.pdf",                       # July 1997 ( irrational exhuberance)
            "https://www.federalreserve.gov/boarddocs/hh/1996/july/FullReport.pdf"
)

df_fed <- 
  data.frame(report=c("Jul2019",paste0("Jul",seq(2018,1996,-1))),stringsAsFactors = FALSE) %>%
  mutate(text= map(fed.links,pdf_text)) %>% unnest(text) %>% 
  group_by(report) %>% mutate(page=row_number()) %>%
  ungroup() %>% mutate(text=strsplit(text,"\r")) %>% unnest(text) %>% mutate(text=gsub("\n","",text)) %>%
  group_by(report) %>% mutate(line=row_number()) %>% ungroup() %>% select(report,line,page,text)

fed_text <- 
  df_fed %>% 
  as_tibble() %>%
  unnest_tokens(word,text)

With the data in hand, we can update our charts.

Let’s first look at sentiment in the reports by year (we’re only looking at the July reports).

Code to construct sentiment, plot data

# custom stop words (includes abbreviations and fragments due to words spanning columns, proper nouns)
custom_stop_words <- 
  bind_rows(data_frame(word = c(tolower(month.abb), "one","two","three","four","five","six",
                                "seven","eight","nine","ten","eleven","twelve","mam","ered",
                                "produc","ing","quar","ters","sug","quar",'fmam',"sug",
                                "cient","thirty","pter","https","reac","cau","suries","prod",
                                "gen","cent","llc","scal",
                                # new 'words' fragments
                                "ty","manufactur","estly","increas","tinued","transporta",
                                "sc","md","struction","cial","manufac","crease","wva","mercial",
                                "ness","commer","al","indus","dis","creases","ported","idential",
                                "er","es","ers","ii","ued","de","mand","ment","moder","contin",
                                "con","tacts", "manu","ments","construc","creased","busi",
                                "mod","tions","mained","ed","va","nc","tive","ly",
                                "charlottesville","vermont","oregon","antic","condi",
                                "antici","pres","facturing","tial","pro","confi","activi","als",
                                # end new words
                                "pp","lp",
                                "pants","ter","ening","ances","www.federalreserve.gov",
                                "tion","fig","ure","figure","src"), 
                       lexicon = c("custom")), 
            stop_words)

## Warning: `data_frame()` is deprecated, use `tibble()`.
## This warning is displayed once per session.

# make chart 1 

fed_sentiment2 <-
  fed_text %>%
  anti_join(custom_stop_words) %>%
  inner_join(get_sentiments("bing")) %>%
  count(report, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(sentiment = positive - negative)

## Joining, by = "word"
## Joining, by = "word"

g1 <-
ggplot(fed_sentiment2,  aes(factor(1996:2019), sentiment/(negative+positive), fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  scale_fill_distiller(palette="RdBu",direction=1)+
  #scale_fill_viridis_c(option="C")+
  theme_ridges(font_family="Roboto",font_size=10)+
  labs(x="report for July of each year",y="Sentiment (>0 positive, <0 negtaive)",
       title="Sentiment of Federal Reserve Monetary Policy Report: 1996-2019",
       subtitle="customized bing lexicon\nsentiment = (#positive-#negative)/(#positive+#negative)",
       caption="@lenkiefer Source: Federal Reserve Board Monetary Policy Reports")

g1+theme(axis.text=element_text(size=rel(0.75)))

We can use the term frequency and inverse document frequency (tf-idf) analysis to gauge what terms are important in each report.

We can modify my earlier code following this Julia Silge blogpost to order the bars within each facet. Nifty!

code for tf-idf plot

fed_textb <- 
  fed_text %>%
  anti_join(custom_stop_words, by="word") %>%
  mutate(word = gsub("[^A-Za-z ]","",word)) %>%  # keep only letters (drop numbers and special symbols)
  filter(word != "") %>%
  count(report,word,sort=TRUE) %>%
  bind_tf_idf(word, report, n) %>%
  arrange(desc(tf_idf))

g2 <- 
  fed_textb %>% 
  anti_join(custom_stop_words, by="word") %>%
  mutate(word = factor(word, levels = rev(unique(word)))) %>% 
  group_by(report) %>%
  mutate(id=row_number()) %>%
  filter(id<11) %>%
  ungroup() %>%
  mutate(wordf = as.factor(word),
         wordf= reorder_within(wordf, tf_idf, report)) %>%
  ggplot(aes(wordf, tf_idf, fill = report)) +
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = "tf-idf") +
  facet_wrap(~report,scales="free", ncol=5)+
  scale_x_reordered() +
  coord_flip()+
  theme_ridges(font_family="Roboto", font_size=10)+
  theme(axis.text.x=element_blank())+
  labs(x="",y ="tf-idf",
       title="Highest tf-idf words in each July Federal Reserve Monetary Policy Report: 1996-2019",
       subtitle="Top 10 terms by tf-idf statistic: term frequncy and inverse document frequency",
       caption="@lenkiefer Source: Federal Reserve Board Monetary Policy Reports\nNote: omits stop words, date abbreviations and numbers.")

g2+theme(axis.text=element_text(size=rel(0.75)))+scale_fill_viridis_d(option="C",end=0.85)

There in July 2019 we see “epop”. Not only does epop feature in the 2019 report, but it’s the first time the term shows up (in the July reports). If you go to the abbreviatons page of the July 2018 report, no epop. Indeed, prior to the 2019 report, epop never showed up.

# Note that we pick up the use of epop in the figure headings
fed_text %>%
  filter(word=="epop") %>%
  count(report)

## # A tibble: 1 x 2
##   report      n
##   <chr>   <int>
## 1 Jul2019    14

Here’s a movie version of my own epop chart, via Twitter:

🎥 1/2 pic.twitter.com/7Xk17EKamZ
— 📈 Len Kiefer 📊 (@lenkiefer) July 5, 2019

R code, modified to track epop, for the charts in the tweets above was shared in my post from earlier this year: Animated Labor Force Participation Chart.

EPOP becomes Fedspeak

epop enters the Federal Reserve lexicon

Share!