2017/09/27

R / data wrangling / slides / Office

IN THIS POST I WANT TO GO THROUGH SOME EXAMPLES of using the purrr package for R.

Now there are already some great examples of how to use purrr. If you’re brand new to purrr (like I was not long ago) probably start with Jenny Bryan’s Purrr tutorial then see R for Data Science and also this presentation from rstudioconf (pdf). You can also check out this curated collection via Mara Averick (on Twitter: dataandme ) for a bunch of examples. I especially liked this post on how to use Purrr to fit a bunch of models and compare them. And you could also check out my post from last year where I looked at collecting time series forecasts.

Those are all great resources, though some are a bit abstract. I found it helpful to review the tutorials and then try to do some things. Then if you go back to the resources they’ll make even more sense.

After I jumped in and did a few things I felt like I was getting the hang of using PURRR. But then I saw this excellent purrr cheatsheet and I realized there were a bunch of functions I hadn’t been using. Let’s try to use them.

The task

I have the purrrfect (sorry first and last pun) task. We’re going to build another PowerPoint! Bet you didn’t see that coming (unless you got spoiled on social media), but yes indeed. If you missed my earlier post we used officer to build a PowerPoint presentation out of an old blog post.

In that example, we already had a bunch of images handy but now we’re going to have to make them. Enter purrr.

Let’s build a chartbook

Let’s imagine that our task is to make a PowerPoint chartbook with a bunch of slides summarizing U.S. macroeconomic conditions. In particular, we want to summarize how the labor market is doing, what’s going on with housing markets and what’s been happening with mortgage rates (pretend you don’t already follow me on Twitter).

How are we going to do this? In the bad old days-maybe just a few minutes ago-we would fire up good old Excel and maybe start gathering the data. Then we’d make a bunch of charts and stick them into our PowerPoint. Yuck! Who wants to live that way? Not you my friends.

Obviously, we can use ggplot2 to make awesome charts in R. And if we’re talking U.S. macroeconomic data we can probably get it from the Saint Louis Federal Reserve Economic Database (FRED). And if we’ve been reading here we’d know how to build a simple plot with FRED and R.

And if we read last week’s post we’d know how to build and edit a PowerPoint with R. But how to put it all together? That’s what this post is all about.

Getting your PowerPoint to purrr

In order to do this we’ll need to break it into three stages. In the first stage, we’ll go get a bunch of macroeconomic data. FRED will make our lives easy (for demonstration purposes only, actual results may vary) by putting all the data we want in one place. But in general we could extend this step to include all the awful spreadsheets where you’ve stuck important data.

In the second step we’ll build our awesome graphs. That will proceed as usual.

In the third step we’ll actually build the PowerPoint deck.

Get data

In the real world, this is by far the most painful step. But here we’ll use the steps I outline in my tidyquant post to get a bunch of data.

A note on FRED codes. I initially had a hard time finding the FRED mnemonics (how FRED labels the variables). Fortunately, I found an awesome resource to help us. This resource is so awesome I’m going to say very little about it today and just link to it. I’ll have more to say about it-perhaps much more-later.

But for now, let me just say that I got my list by modifying the list provided by Michael McCracken and Serena Ng in the FRED-MD: A Monthly Database for Macroeconomic Research. Note that I didn’t actually use their database here, I just used the appendix to get my list jumpstarted. Then I augmented it here and there with a few extra series.

What I would normally do is keep my list of mnemonics in a separate file and load them in. But here I’ll just type them in so you can see exactly what I’m doing without loading a text file. In the code below we’ll set up our FRED tickers.

#####################################################################################
## Step 0: Load Libraries ##
#####################################################################################
library(tidyverse)
library(tidyquant)

#####################################################################################
## Step 1: Prepare for data  ##
#####################################################################################

# Get a list of mnemoics
# This could be a text file if we wanted a long list
# Note, these data are all monthly frequency
tickers <- c('UNRATE',
             'PAYEMS',
             'HOUST',
             'TB3MS',
             'TB6MS',
             'GS1',
             'GS2',
             'GS5',
             'GS10',
             'CPIAUCSL',
             'CIVPART',
             'LNS11300060',
             'LNS11300001',
             'LNS11300002',
             'MORTGAGE30US',
             'HPIPONM226S',
             'HSN1F')

# Next, list human readable variable names
myvars <- c('Civilian Unemployment Rate',
            'All Employees: Total nonfarm',
            'Housing Starts: Total New Privately Owned',
            '3-Month Treasury Bill:',
            '6-Month Treasury Bill:',
            '1-Year Treasury Rate',
            '2-Year Treasury Rate',
            '5-Year Treasury Rate',
            '10-Year Treasury Rate',
            'CPI : All Items',
            'Civilian Labor Force Participation Rate',
            'Civilian Labor Force Participation Rate: 25 to 54 years',
            'Civilian Labor Force Participation Rate: Men',
            'Civilian Labor Force Participation Rate: Women',
            '30-Year Fixed Rate Mortgage Average in the United States',
            'Purchase Only House Price Index for the United States',
            'New Home Sales')

mytransform<-c('none',
             'diff',    # difference
             'none',
             'none',
             'none',
             'none',
             'none',
             'none',
             'none',
             'pdiff12',  # 12-month percent difference
             'none',
             'none',
             'none',
             'none',
             'none',
             'pdiff12',  # 12-month percent difference
             'none')

# Units variables (after transform) are measured in
myunit <- c('%',
            'Monthly Change in Thousands',
            'Thousands, SAAR',
            '%',
            '%',
            '%',
            '%',
            '%',
            '%',
            '12-month % change',
            '%',
            '%',
            '%',
            '%',
            '%',
            '12-month % change',
            'Thousands, SAAR'
)


# Create a lookup dataset
mylookup<-data.frame(symbol=tickers,var=myvars,trans=mytransform,unit=myunit)
# Take a look:
knitr::kable(mylookup)

symbol	var	trans	unit
UNRATE	Civilian Unemployment Rate	none	%
PAYEMS	All Employees: Total nonfarm	diff	Monthly Change in Thousands
HOUST	Housing Starts: Total New Privately Owned	none	Thousands, SAAR
TB3MS	3-Month Treasury Bill:	none	%
TB6MS	6-Month Treasury Bill:	none	%
GS1	1-Year Treasury Rate	none	%
GS2	2-Year Treasury Rate	none	%
GS5	5-Year Treasury Rate	none	%
GS10	10-Year Treasury Rate	none	%
CPIAUCSL	CPI : All Items	pdiff12	12-month % change
CIVPART	Civilian Labor Force Participation Rate	none	%
LNS11300060	Civilian Labor Force Participation Rate: 25 to 54 years	none	%
LNS11300001	Civilian Labor Force Participation Rate: Men	none	%
LNS11300002	Civilian Labor Force Participation Rate: Women	none	%
MORTGAGE30US	30-Year Fixed Rate Mortgage Average in the United States	none	%
HPIPONM226S	Purchase Only House Price Index for the United States	pdiff12	12-month % change
HSN1F	New Home Sales	none	Thousands, SAAR

We can see that we have 18 variables. I have also included a variable mytransform that indicates the type of transformation (if any) we want to impost on our data. For example, we want to look at the monthly change in nonfarm payroll employment (PAYEMS) so we want to set mytransform=diff.

#####################################################################################
## Step 2: Pull data  ##
#####################################################################################

tickers %>% tq_get(get="economic.data", 
                   from="2000-01-01"   # we start from January 2000
                   ) -> df

df<-merge(df,mylookup,by="symbol") %>% 
  rename(value=price) %>%
  map_if(is.factor, as.character) %>%  #convert factors to character!
  as.tibble()

Now that we have collected our data let’s go plot it.

Plot data

What we are going to do is create a simple line plot for each series in our data and save it as it’s own individual image file (.png here). Then in step 3 we’ll paste those images into our PowerPoint.

Plot function

We’ll make a function that takes a variable mnemonic, filters our data to just that variable and makes a line plot.

# Make a function to handle transformations
mytransf<- function(x,trans="none"){
  switch(trans,
         none=x,
         diff=c(NA,diff(x)),
         pdiff12= 100*quantmod::Delt(x,k=12) # use Quantmod Delt() function 
  )
}

# make a function to save images (here in ~/img directory)
myplot <- function(in.var="UNRATE", in.trans="none", save="N"){
 dfp<-filter(df,symbol==in.var)
 g<-
   ggplot(data=dfp, aes(x=date,y=mytransf(value,in.trans)))+
   geom_line(color="royalblue")+
   labs(title=paste0(head(dfp,1)$var," (",head(dfp,1)$unit,")"),
        caption="@lenkiefer Source: St. Louis Federal Reserve Economic Database (FRED)\nTHIS IS A TEST ONLY",
        x="",y="")+
   theme_minimal()+
   theme(plot.caption=element_text(hjust=0))
  if (save=="N") {print(g)} # plot image}
 if (save=="Y") {ggsave(filename=paste0("img/",in.var,".png"),
                        width=8,height=6)}
}
myplot()

Save images

Here we can use a new purrr function I hadn’t heard about: walk(). Walk is just like map() except we use it for the side effects, like saving an image. The myplot() function we just wrote allows us to save images. To use it we simply run:

mylookup %>% mutate(test=walk2(symbol,trans,myplot,save="Y"))

Then, for each row in mylookup we’ll save an image taking our input variable symbol transformed according to our trans variable.

Make our awesome chartbook

Now we can apply purrr along with officer and create our awesome PowerPoint deck. We’ll need another function, this one to write slides.

First as last time we’ll load in our template and get started.

library(officer)
library(magrittr)
# Set a footer
myftr<-"@lenkiefer Purrrty PowerPoint"
my_pres<-read_pptx("data/blank.pptx") %>% 
  # Load template
  # Add a slide
  add_slide(layout="Title Slide", master="Office Theme") %>%
  # Add some text to the title (ctrTitle)
  ph_with_text(type = "ctrTitle", str = "Our totally awesome chartbook") %>% 
  # Add some text to the subtitle (subTitle)
  ph_with_text(type = "subTitle", str = "A purrrfect example") %>% 
  ph_with_text(type = "ftr", str = myftr )

Then we’ll need to add a function to write slides.

slidef<- function(var, preso=my_pres){
  my_pres %>%
    add_slide(layout = "Title and Content", master = "Office Theme") %>%
    ph_with_text(type = "ftr", str = myftr ) %>%
    ph_with_img(type = "body", index = 1, 
                src =paste0("img/",var,".png" # got get our images
                            )) -> my_pres
}

Now that we have our slide addition function we can use walk() to call this function and append our slides:

walk(mylookup$symbol,slidef)  

# This will add to my_pres variable as side effect

Finally

my_pres %>% print( target = "Purrrty.pptx") %>% 
  invisible()

You can download the deck here. Below I’ve embedded a pdf version of the slidedeck.

Next steps

There’s a lot more to do here. Obviously on the PowerPoint front we can do a lot more. We could add more metadata about our charts to create different kinds of charts for example. There’s more we could do with the purrr functions.

But I think we can all agree that this type of application is exactly what the creators of purrr, including Hadley Wickham, had in mind when they created this awesome library.

Also, there’s still quite a bit to do about how things look. We haven’t made things pretty yet. And we still haven’t talked about strategy for how to actually use these tools. Now that we have a base we can look to those topics in future posts.

PURRRty PowerPoint with R