2017/10/10

data wrangling / dataviz / R

WHILE I WAS TRYING TO MAKE TIME FOR TIBBLETIME yesterday I got distracted and made this plot:

hey! here's a #dataviz: tile plot of U.S. state unemployment pic.twitter.com/vH8fSu63ve
— 📈 Len Kiefer 📊 (@lenkiefer) October 10, 2017

In this post, let’s go over how to make this plot with R. And we’re going to make it quick.

Setup

In order to create a plot like this we’ll need several packages, including the tidyverse, geofacet and the tidyquant package. Be sure to check out this post on geofacets and this post on using tidyquant.

We’ll get our data from Saint Louis Federal Reserve’s FRED database. We can access these data easily from the tidyquant package.

Do you even FRED bro?

We’re also going to take advantage of the way FRED mnemonics are set up to extract a bunch of data all at once.

When FRED has state level data they often set up their mnemonics following a pattern like {STATE}{MNEMONIC}, with STATE the two-letter state abbreviation and the MNEMONIC the variable mnemonic. For example, the unemployment rate’s MNEMONIC is UR, so if we wanted California’s unemployment rate we would look up CAUR.

Now, if we could get a list of state abbreviations we could easily extract a bunch of state statistics using tidyquant. Fortunately, it’s easy to find one. There happens to be a list in the geofacet library. In the geofacet library there’s a data.frame called us_state_grid3 that has a variable called code with the state codes.

#####################################################################################
## Step 1: Load Libraries ##
#####################################################################################
library(tidyverse)
library(tidyquant)
library(tibbletime)
library(geofacet)
library(viridis)
library(scales)

#####################################################################################
## Step 2: go get data ##
#####################################################################################
df <- tq_get(paste0(us_state_grid3$code,      # get state abbreviations
                    "UR"),                    # append UR
             get="economic.data",             # use FRED
             from="1990-01-01")               # go from 1990 forward

df %>% mutate(state=substr(symbol,1,2)        # create a state variable
  ) -> df

#####################################################################################
## Step 3: make awesome plot ##
#####################################################################################
  ggplot(data=df, 
         aes(x=year(date),y=factor(month(date)),
           fill=price))+
  geom_tile()+
  scale_fill_viridis(option="D",name="Unemployment Rate (%) ")+
  scale_y_discrete(breaks=c(1,12),labels=c("Jan","Dec"))+
  scale_x_continuous(breaks=c(1991,2017))+
  facet_geo(~state, grid="us_state_grid3")+
  labs(x="",y="",title="Unemployment Rate by State",
       caption=paste0("@lenkiefer Source: U.S. Bureau of Labor Statistics,",
                      " Unemployment rate by state [XX]UR,\nretrieved from FRED, ",
                      "Federal Reserve Bank of St. Louis;",
                      " https://fred.stlouisfed.org/series/[XX]UR, October 10, 2017.",
                      "\nwhere [XX] stands for state abbreviation (e.g. California==CA)"))+
  theme(plot.title=element_text(face="bold",size=18),
        legend.position="top",
        plot.subtitle=element_text(face="italic",size=14),
        plot.caption=element_text(face="italic",size=9),
        axis.text=element_text(size=6))

Cool!

What else can we do? Let’s make a plot of employment growth. The mnemonic for nonfarm employment in NA. Now you know what to do:

#####################################################################################
## Get employment data ##
#####################################################################################
df <- tq_get(paste0(us_state_grid3$code,"NA"),get="economic.data",from="1990-01-01")

df %>% 
  mutate(state=substr(symbol,1,2)) %>% 
  group_by(state) %>%
  mutate(dy=Delt(price,k=12)) %>%    # Use quantmod::Delt to compute k (12) period percent change
  ungroup() %>% 
  as_tbl_time(index=date) ->         # Turn into tibbletime object so we can filter
  df


#####################################################################################
## Make awesome plot ##
#####################################################################################
g1<-
  ggplot(data=#time_filter(df,1991~2017),   
          filter(df, year(date)>1990, year(date)<2018),
         # tibbletime::time_filter() will save me many keystrokes
         aes(
           x=year(date),
           y=factor(month(date)),
           fill=dy
         ))+
  geom_tile()+
  scale_fill_viridis(option="C",name="12-month percent change in employment ",labels=percent)+
  scale_y_discrete(breaks=c(1,12),labels=c("Jan","Dec"))+
  scale_x_continuous(breaks=c(1991,2017))+
  facet_geo(~state, grid="us_state_grid3")+
  labs(x="",y="",title="Employment growth by state",
       subtitle="12-month percent change in nonfarm employment",
       caption="@lenkiefer Source: U.S. Bureau of Labor Statistics, All Employees: Total Nonfarm by state [XX]NA,\nretrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/[XX]NA, October 10, 2017.\nwhere [XX] stands for state abbreviation (e.g. California==CA)")+
  theme(plot.title=element_text(face="bold",size=18),
        legend.position="top",
        plot.subtitle=element_text(face="italic",size=14),
        plot.caption=element_text(face="italic",size=9),
        axis.text=element_text(size=6))

g1

We can also plot a more traditional time series. Let’s plot permits for 1-unit properties by state. The data are noisy, so let’s use tibbletime to compute a 12 month rolling average.

# The function to use at each step is `mean`.
# The window size is 12 (months)
rolling_mean <- rollify(mean, window = 12)

#####################################################################################
## Get permits data (St Lousi FED seasonally adjustes the data for us) ##
#####################################################################################
df2 <- tq_get(paste0(us_state_grid3$code,"BP1FHSA"),get="economic.data",from="1990-01-01")

df2 %>% 
  mutate(state=substr(symbol,1,2)) %>% 
  group_by(state) %>%
  mutate(dy=rolling_mean(price)) %>% 
  ungroup() %>% 
  as_tbl_time(index=date) -> 
  df2

#####################################################################################
## Make awesome plot ##
#####################################################################################
g2<-
  ggplot(data=df2, aes(x=date,y=dy))+
  geom_line()+
  facet_geo(~state, grid="us_state_grid3",scales="free")+
  labs(x="",y="",title="New Private Housing Units Authorized by Building Permits: 1-Unit Structures",
       subtitle="12-month moving average, different y axis scale for each state",
       caption="@lenkiefer Source: Federal Reserve Bank of St. Louis, New Private Housing Units Authorized by Building Permits: 1-Unit Structures by state [XX]NA,\nretrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/[XX]BP1FHSA, October 10, 2017.\nwhere [XX] stands for state abbreviation (e.g. California==CA)")+
  theme(plot.title=element_text(face="bold",size=14),
        legend.position="top",
        panel.background=element_blank(),
        panel.grid.major=element_line(color="lightgray"),
        plot.subtitle=element_text(face="italic",size=12),
        plot.caption=element_text(face="italic",size=9),
        axis.text=element_text(size=6))+
  scale_x_date(breaks=c(min(df2$date),max(df2$date)),date_labels="%Y")
g2

How can this work for you?

FRED has a lot of different data series available by state. So you can easily modify this code to extract those data and plot.

Then if we had a list of variables we might be able to use purrr::map to do some interesting things. I have such a thing in mind for my tidyPowerPoint workflow I’m building.

#####################################################################################
## A function might be useful ##
#####################################################################################
myplot <- function( v="STHPI",
                    yourtitle="House price index by state", 
                    yoursub="Index 1980:Q1=100, Not Seasonally Adjusted",
                    yourcaption="@lenkiefer Source U.S. Federal Housing Finance Agency, All-Transactions House Price Index by state [XXSTHPI],\nretrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/XXSTHPI, October 10, 2017.\nwhere [XX] stands for state abbreviation (e.g. California==CA)")
  {
  #####################################################################################
  ## Get requested data ##
  #####################################################################################
  df <- tq_get(paste0(us_state_grid3$code,v),get="economic.data",from="1990-01-01")
  df %>% 
    mutate(state=substr(symbol,1,2)) %>% 
    group_by(state) %>%
    mutate(dy=Delt(price,k=12)) %>% 
    ungroup() %>% 
    as_tbl_time(index=date) -> 
    df
  
  g1<-
    ggplot(data=#time_filter(df,1991~2017),
           filter(df, year(date)>1990, year(date)<2018),
           # these time variables could also be function arguments
           aes(
             x=date,
             y=price
           ))+
    geom_line()+
    facet_geo(~state, grid="us_state_grid3")+
    labs(x="",y="",title=yourtitle,
         subtitle=yoursub,
         caption=yourcaption)+
    theme(plot.title=element_text(face="bold",size=18),
          legend.position="top",
          plot.subtitle=element_text(face="italic",size=14),
          plot.caption=element_text(face="italic",size=9))+
  geom_line()+
  scale_x_date(breaks=c(min(df$date),max(df$date)),date_labels="%Y")
  g1
}

myplot()

Could this work for you? Because I know I’m going to use it.

quick geofacet plots

Setup

Do you even FRED bro?

How can this work for you?

Share!