IN THIS POST I WANT TO GO THROUGH SOME EXAMPLES of using the purrr package for R.
Now there are already some great examples of how to use purrr. If you’re brand new to purrr (like I was not long ago) probably start with Jenny Bryan’s Purrr tutorial then see R for Data Science and also this presentation from rstudioconf (pdf). You can also check out this curated collection via Mara Averick (on Twitter: dataandme ) for a bunch of examples. I especially liked this post on how to use Purrr to fit a bunch of models and compare them. And you could also check out my post from last year where I looked at collecting time series forecasts.
Those are all great resources, though some are a bit abstract. I found it helpful to review the tutorials and then try to do some things. Then if you go back to the resources they’ll make even more sense.
After I jumped in and did a few things I felt like I was getting the hang of using PURRR. But then I saw this excellent purrr cheatsheet and I realized there were a bunch of functions I hadn’t been using. Let’s try to use them.
The task
I have the purrrfect (sorry first and last pun) task. We’re going to build another PowerPoint! Bet you didn’t see that coming (unless you got spoiled on social media), but yes indeed. If you missed my earlier post we used officer to build a PowerPoint presentation out of an old blog post.
In that example, we already had a bunch of images handy but now we’re going to have to make them. Enter purrr.
Let’s build a chartbook
Let’s imagine that our task is to make a PowerPoint chartbook with a bunch of slides summarizing U.S. macroeconomic conditions. In particular, we want to summarize how the labor market is doing, what’s going on with housing markets and what’s been happening with mortgage rates (pretend you don’t already follow me on Twitter).
How are we going to do this? In the bad old days-maybe just a few minutes ago-we would fire up good old Excel and maybe start gathering the data. Then we’d make a bunch of charts and stick them into our PowerPoint. Yuck! Who wants to live that way? Not you my friends.
Obviously, we can use ggplot2 to make awesome charts in R. And if we’re talking U.S. macroeconomic data we can probably get it from the Saint Louis Federal Reserve Economic Database (FRED). And if we’ve been reading here we’d know how to build a simple plot with FRED and R.
And if we read last week’s post we’d know how to build and edit a PowerPoint with R. But how to put it all together? That’s what this post is all about.
Getting your PowerPoint to purrr
In order to do this we’ll need to break it into three stages. In the first stage, we’ll go get a bunch of macroeconomic data. FRED will make our lives easy (for demonstration purposes only, actual results may vary) by putting all the data we want in one place. But in general we could extend this step to include all the awful spreadsheets where you’ve stuck important data.
In the second step we’ll build our awesome graphs. That will proceed as usual.
In the third step we’ll actually build the PowerPoint deck.
Get data
In the real world, this is by far the most painful step. But here we’ll use the steps I outline in my tidyquant post to get a bunch of data.
A note on FRED codes. I initially had a hard time finding the FRED mnemonics (how FRED labels the variables). Fortunately, I found an awesome resource to help us. This resource is so awesome I’m going to say very little about it today and just link to it. I’ll have more to say about it-perhaps much more-later.
But for now, let me just say that I got my list by modifying the list provided by Michael McCracken and Serena Ng in the FRED-MD: A Monthly Database for Macroeconomic Research. Note that I didn’t actually use their database here, I just used the appendix to get my list jumpstarted. Then I augmented it here and there with a few extra series.
What I would normally do is keep my list of mnemonics in a separate file and load them in. But here I’ll just type them in so you can see exactly what I’m doing without loading a text file. In the code below we’ll set up our FRED tickers.
#####################################################################################
## Step 0: Load Libraries ##
#####################################################################################
library(tidyverse)
library(tidyquant)
#####################################################################################
## Step 1: Prepare for data ##
#####################################################################################
# Get a list of mnemoics
# This could be a text file if we wanted a long list
# Note, these data are all monthly frequency
tickers <- c('UNRATE',
'PAYEMS',
'HOUST',
'TB3MS',
'TB6MS',
'GS1',
'GS2',
'GS5',
'GS10',
'CPIAUCSL',
'CIVPART',
'LNS11300060',
'LNS11300001',
'LNS11300002',
'MORTGAGE30US',
'HPIPONM226S',
'HSN1F')
# Next, list human readable variable names
myvars <- c('Civilian Unemployment Rate',
'All Employees: Total nonfarm',
'Housing Starts: Total New Privately Owned',
'3-Month Treasury Bill:',
'6-Month Treasury Bill:',
'1-Year Treasury Rate',
'2-Year Treasury Rate',
'5-Year Treasury Rate',
'10-Year Treasury Rate',
'CPI : All Items',
'Civilian Labor Force Participation Rate',
'Civilian Labor Force Participation Rate: 25 to 54 years',
'Civilian Labor Force Participation Rate: Men',
'Civilian Labor Force Participation Rate: Women',
'30-Year Fixed Rate Mortgage Average in the United States',
'Purchase Only House Price Index for the United States',
'New Home Sales')
mytransform<-c('none',
'diff', # difference
'none',
'none',
'none',
'none',
'none',
'none',
'none',
'pdiff12', # 12-month percent difference
'none',
'none',
'none',
'none',
'none',
'pdiff12', # 12-month percent difference
'none')
# Units variables (after transform) are measured in
myunit <- c('%',
'Monthly Change in Thousands',
'Thousands, SAAR',
'%',
'%',
'%',
'%',
'%',
'%',
'12-month % change',
'%',
'%',
'%',
'%',
'%',
'12-month % change',
'Thousands, SAAR'
)
# Create a lookup dataset
mylookup<-data.frame(symbol=tickers,var=myvars,trans=mytransform,unit=myunit)
# Take a look:
knitr::kable(mylookup)
symbol | var | trans | unit |
---|---|---|---|
UNRATE | Civilian Unemployment Rate | none | % |
PAYEMS | All Employees: Total nonfarm | diff | Monthly Change in Thousands |
HOUST | Housing Starts: Total New Privately Owned | none | Thousands, SAAR |
TB3MS | 3-Month Treasury Bill: | none | % |
TB6MS | 6-Month Treasury Bill: | none | % |
GS1 | 1-Year Treasury Rate | none | % |
GS2 | 2-Year Treasury Rate | none | % |
GS5 | 5-Year Treasury Rate | none | % |
GS10 | 10-Year Treasury Rate | none | % |
CPIAUCSL | CPI : All Items | pdiff12 | 12-month % change |
CIVPART | Civilian Labor Force Participation Rate | none | % |
LNS11300060 | Civilian Labor Force Participation Rate: 25 to 54 years | none | % |
LNS11300001 | Civilian Labor Force Participation Rate: Men | none | % |
LNS11300002 | Civilian Labor Force Participation Rate: Women | none | % |
MORTGAGE30US | 30-Year Fixed Rate Mortgage Average in the United States | none | % |
HPIPONM226S | Purchase Only House Price Index for the United States | pdiff12 | 12-month % change |
HSN1F | New Home Sales | none | Thousands, SAAR |
We can see that we have 18 variables. I have also included a variable mytransform
that indicates the type of transformation (if any) we want to impost on our data. For example, we want to look at the monthly change in nonfarm payroll employment (PAYEMS
) so we want to set mytransform=diff
.
#####################################################################################
## Step 2: Pull data ##
#####################################################################################
tickers %>% tq_get(get="economic.data",
from="2000-01-01" # we start from January 2000
) -> df
df<-merge(df,mylookup,by="symbol") %>%
rename(value=price) %>%
map_if(is.factor, as.character) %>% #convert factors to character!
as.tibble()
Now that we have collected our data let’s go plot it.
Plot data
What we are going to do is create a simple line plot for each series in our data and save it as it’s own individual image file (.png here). Then in step 3 we’ll paste those images into our PowerPoint.
Plot function
We’ll make a function that takes a variable mnemonic, filters our data to just that variable and makes a line plot.
# Make a function to handle transformations
mytransf<- function(x,trans="none"){
switch(trans,
none=x,
diff=c(NA,diff(x)),
pdiff12= 100*quantmod::Delt(x,k=12) # use Quantmod Delt() function
)
}
# make a function to save images (here in ~/img directory)
myplot <- function(in.var="UNRATE", in.trans="none", save="N"){
dfp<-filter(df,symbol==in.var)
g<-
ggplot(data=dfp, aes(x=date,y=mytransf(value,in.trans)))+
geom_line(color="royalblue")+
labs(title=paste0(head(dfp,1)$var," (",head(dfp,1)$unit,")"),
caption="@lenkiefer Source: St. Louis Federal Reserve Economic Database (FRED)\nTHIS IS A TEST ONLY",
x="",y="")+
theme_minimal()+
theme(plot.caption=element_text(hjust=0))
if (save=="N") {print(g)} # plot image}
if (save=="Y") {ggsave(filename=paste0("img/",in.var,".png"),
width=8,height=6)}
}
myplot()
Save images
Here we can use a new purrr function I hadn’t heard about: walk()
. Walk is just like map()
except we use it for the side effects, like saving an image. The myplot()
function we just wrote allows us to save images. To use it we simply run:
mylookup %>% mutate(test=walk2(symbol,trans,myplot,save="Y"))
Then, for each row in mylookup we’ll save an image taking our input variable symbol
transformed according to our trans
variable.
Make our awesome chartbook
Now we can apply purrr along with officer and create our awesome PowerPoint deck. We’ll need another function, this one to write slides.
First as last time we’ll load in our template and get started.
library(officer)
library(magrittr)
# Set a footer
myftr<-"@lenkiefer Purrrty PowerPoint"
my_pres<-read_pptx("data/blank.pptx") %>%
# Load template
# Add a slide
add_slide(layout="Title Slide", master="Office Theme") %>%
# Add some text to the title (ctrTitle)
ph_with_text(type = "ctrTitle", str = "Our totally awesome chartbook") %>%
# Add some text to the subtitle (subTitle)
ph_with_text(type = "subTitle", str = "A purrrfect example") %>%
ph_with_text(type = "ftr", str = myftr )
Then we’ll need to add a function to write slides.
slidef<- function(var, preso=my_pres){
my_pres %>%
add_slide(layout = "Title and Content", master = "Office Theme") %>%
ph_with_text(type = "ftr", str = myftr ) %>%
ph_with_img(type = "body", index = 1,
src =paste0("img/",var,".png" # got get our images
)) -> my_pres
}
Now that we have our slide addition function we can use walk()
to call this function and append our slides:
walk(mylookup$symbol,slidef)
# This will add to my_pres variable as side effect
Finally
my_pres %>% print( target = "Purrrty.pptx") %>%
invisible()
You can download the deck here. Below I’ve embedded a pdf version of the slidedeck.
Next steps
There’s a lot more to do here. Obviously on the PowerPoint front we can do a lot more. We could add more metadata about our charts to create different kinds of charts for example. There’s more we could do with the purrr functions.
But I think we can all agree that this type of application is exactly what the creators of purrr, including Hadley Wickham, had in mind when they created this awesome library.
Also, there’s still quite a bit to do about how things look. We haven’t made things pretty yet. And we still haven’t talked about strategy for how to actually use these tools. Now that we have a base we can look to those topics in future posts.