21 May 2017

LET’S TAKE A LOOK AT RECENT TRENDS IN CONSUMER PRICES AND HOUSEHOLD DEBT. Along the way we’ll refresh some visualizations of consumer prices (see here) and household debt (see here) we made last year, as well as think up some new ones.

As usual we’ll use R to generate the plots and I’ll share the code below. But before we get into the details of constructing the charts, let’s just look at two plots to help set the stage.

First, let’s examine trends in U.S. consumer prices since the year 2000. We’ll get into the data details later, but these data are from the U.S. Bureau of Labor Statistics BLS.gov. The Consumer Price Index (CPI) LINK tracks “changes in the prices paid by urban consumers for a representative basket of goods and services”.

The BLS provides very detailed measures (it’s 5 o’clock somewhere so feel free to read) down to individual price categories. They also aggregate into various aggregate categories.

Let’s look at a few of them:

plot of chunk 05-21-2017-plot-1

These data are normalized so that January 2000 is equal to 100. The trend lines compare how the seasonally-adjusted price indices have evolved since then. In case you didn’t notice, I’ve made the education line thicker to call attention to it. From January 2000 to April 2017 (latest data available) the CPI for all items was up 44 percent (144 in the chart) vs 129 percent (229 in the chart) for the education index.

There are important niceties see again the BLS about how the index is constructed and what it really measures. The important point is education costs are up a lot more than overall prices since 2000.

On a possibly related note, consider this chart showing household debt trends using data from the New York Federal Reserve Bank/Equifax’s consumer credit panel LINK.

plot of chunk 05-21-2017-plot-2

Got it? Price up, debt up. Let’s start building some more visualizations of these data and explore these trends in more detail.

Step 1: get the data

The data for these plots are conveniently made available by the BLS and the New York Fed.

A quick note on data manipulation. I have been cycling between using a data.table based approach for data wrangling and a dplyr/tidyverse based approach. Both seem to work. For the CPI data I’ll mostly use data.table and for the household debt data I’ll mostly use dplyr.

Consumer price data

First, we’ll follow my CPI post to get the Consumer Price data.

###############################################################################
#### Load libraries
###############################################################################
library(data.table,quietly=T,warn.conflicts=F)
library(tidyverse,quietly=T,warn.conflicts=F)

###############################################################################
#### Read in data from BLS
#### See: for readme https://download.bls.gov/pub/time.series/cu/cu.txt
###############################################################################

# get data
cpi1<-fread('http://download.bls.gov/pub/time.series/cu/cu.data.2.Summaries')

# Get item names
cpi.item<-fread("http://download.bls.gov/pub/time.series/cu/cu.item",
                header=FALSE,col.names=c("item.code","item.name","display.level",
                                         "selectable","sort.sequence","blank"))
# get series info
cpi.series<-fread("http://download.bls.gov/pub/time.series/cu/cu.series",
                  col.names=c("series_id","area_code","item_code","seasonal",
                              "periodicity_code","base_code","base_period",
                            "footnote_codes","begin_year","begin_period",
                            "end_year","end_period","blank"),header=F)
# merge
cpi2<-merge(cpi.item,cpi.series,by.x="item.code",by.y="item_code")

setkeyv(cpi1,"series_id")            
setkeyv(cpi2,"series_id")
cpi3<-cpi2[cpi1]
# create date
cpi3<-cpi3[,month:=as.numeric(substr(cpi3$period,2,3))]
cpi3$date<- as.Date(ISOdate(cpi3$year,cpi3$month,1) )

# subset data
cpi4<-cpi3[area_code=="0000" & seasonal=="S" & item.code!= "SAA1" & item.code !="SAA2"]
# get January 2000 data 
bdata<-cpi4[year==2000 & month==1,]
bdata<-dplyr::rename(bdata, value00=value)
bdata<-bdata[, c('value00','series_id'), with = FALSE]
cpi5<-merge(cpi4,bdata,by="series_id")

# reindex data so that 2000 = 100
cpi5<-cpi5[,cpi00:=100*value/value00]
cpi6<-cpi5[year>1999]
xlim<-c(min(cpi6$date),max(cpi6$date))
dd<-unique(cpi6$date)


# Originally, had Education combined with communication (as BLS reports it)
# let's just get education
# Series id: CUSR0000SAE1

  cpi.ed<-fread("http://download.bls.gov/pub/time.series/cu/cu.data.17.USEducationAndCommunication")
  cpi.ed[,month:=as.numeric(substr(cpi.ed$period,2,3))]
  cpi.ed$date<- as.Date(ISOdate(cpi.ed$year,cpi.ed$month,1) )
  cpi.ed$item.name<-"Education"
  cpi.ed<-cpi.ed[series_id=="CUSR0000SAE1"]
  bdata.ed<-cpi.ed[year==2000 & month==1,]
  bdata.ed<-dplyr::rename(bdata.ed, value00=value)
  bdata.ed<-bdata.ed[, c('value00','series_id'), with = FALSE]
  cpi.ed<-merge(cpi.ed,bdata.ed,by="series_id")
  cpi.ed[,cpi00:=100*value/value00]

Household debt data

The household debt data is available in an Excel spreadsheet. We’ll have to use readxl (see this post for more on wrangling Excel data with R). Let’s start by assuming that you have downloaded the latest credit panel summary data (available here .xlsx file) and saved it as HHD_C_Report_2017Q1.xlsx in a data directory.

Now in order to get this data useable, we have to use readxl to reach into the files and extract the useful information. I had originally been doing some manual manipulation in Excel, but ewww gross, let’s try to do it all with R.

It’s going to involve understanding the structure of the Excel file and using some of readxl’s nice features.

This spreadsheet is-sigh-well, set up like spreadsheets are wont to be. The data we want is in the “Page 3 Data” sheet, organized with variables in rows and data in columns. The row titles we want are not contiguous with the data (there’s a gap of a couple columns). We’ll have to import the data and try to beat it into tidy submission so we can actually work with it.

# We'll use the tidyverse here
library(readxl,quietly=T,warn.conflicts=F)
library(tidyr,quietly=T,warn.conflicts=F)
library(dplyr,quietly=T,warn.conflicts=F)
library(purrr,quietly=T,warn.conflicts=F)

# data is range D4: BH10, we'll get the data values from row 3
# the row titles are in column A, so we'll get it and then delete blanks

df<-read_excel("data/HHD_C_Report_2017Q1.xlsx", 
               sheet = "Page 3 Data",  # name of sheet
               range="A3:BH10" )        # range where data lives

## we've got blank columns in positions 2 & 3 (drop them)
## then we'll gather the data and rename X__1 as "type"

df2<-df %>% select(-X__2,-X__3) %>% gather(datec,debt,c(2:58)) %>% rename(type=X__1)

#now create a date variable using the datec string:

df2<- df2 %>% 
  mutate(year=2000+as.numeric(substr(datec,1,2)),
         month=as.numeric(substr(datec,5,5))*3) %>%
  # create date variable
  mutate(date=as.Date(ISOdate(year,month,1))) %>% 
  # drop unneeede variables
  select(type,date,debt)

Hey, not so bad. Let’s make a table looking at the last quarter of data (2017 Q1).

# Make a table for 2017 Q1:

htmlTable(
  df2 %>% filter(date=="2017-03-01") %>%
    arrange(-debt) %>%
    map_if(is.numeric,scales::dollar) %>% as.data.frame(),
  col.rgroup = c("none", "#F7F7F7"),
  caption="Aggregate Household Debt ($ Trillions, 2017 Q1)",
  tfoot="Source: FRBNY Consumer Credit Panel/Equifax")
Aggregate Household Debt ($ Trillions, 2017 Q1)
type date debt
1 Total 2017-03-01 $12.72
2 Mortgage 2017-03-01 $8.63
3 Student Loan 2017-03-01 $1.34
4 Auto Loan 2017-03-01 $1.17
5 Credit Card 2017-03-01 $0.76
6 HE Revolving 2017-03-01 $0.46
7 Other 2017-03-01 $0.37
Source: FRBNY Consumer Credit Panel/Equifax

Step 2: make some awesome charts

Now that we’ve got our data, we can start making some awesome charts. Or maybe I should say continue making awesome charts, because we never really stop do we?

Let’s get to it.

Venerable line charts

We’ll get crazy I promise but let’s start simple with the venerable line chart. The code below will recreate the plots we started with.

CPI line chart

library(ggrepel,quietly=T,warn.conflicts=F) # for the labels (avoid overplotting)


# dd is a list of dates, we'll set i = max date for labels
xlim<-c(min(cpi6$date),max(cpi6$date))
dd<-unique(cpi6$date)
i<- length(dd)

ggplot(data=cpi6[ 
  #drop some items for clarity
  ! (item.name %in% c("Other goods and services","Services",
                      "Education and communication"))],
  aes(x=date,y=cpi00,color=item.name))+
  geom_line()+
  theme_minimal()+   theme(legend.justification=c(0,0), legend.position="none")+
  scale_y_log10(limits=c(90,250),breaks=c(90,100,120,140,160,180,200,250))+
  geom_line(data=cpi.ed[ year>1999],size=1.5,color="red")+
  scale_x_date(labels= date_format("%b-%Y"),
               limits = as.Date(c('2000-01-01','2018-12-31')))+
  geom_text_repel(
    data = cpi6[date==dd[i] & 
                  ! (item.name %in% c("Other goods and services","Services",
                                      "Education and communication"))],
    aes(label = item.name), size = 3.5, nudge_x = 1) +
  geom_text(
    data = cpi.ed[date==dd[i]],color="red",
    aes(label = item.name),
    size = 3.5,nudge_x = 1,nudge_y=.01) +
  labs(x="", y="Consumer Price Index (log scale, Jan 2000=100, SA)",
       title="Consumer Prices",
       subtitle="by major category",
       caption="@lenkiefer Source: U.S. Bureau of Labor Statistics")+
  theme(plot.title=element_text(size=18),
        plot.caption=element_text(hjust=0,vjust=1),
        plot.margin=unit(c(0.25,0.25,0.25,0.25),"cm"))

plot of chunk 05-21-2017-plot-1-display

Household debt line chart

ggplot(data=filter(df2,type!="Total" & type !="Mortgage"),
       aes(x=date,y=debt,color=type,label=type))+
  geom_line(size=1.2)+theme_minimal()+
  geom_text(data=tail(filter(df2,type!="Total" & type !="Mortgage"),5),
            nudge_x=10,nudge_y=.025)+
  coord_cartesian(xlim=c(as.Date("2003-01-01"),as.Date("2017-12-31")), y=c(0,1.5))+
  theme(plot.title=element_text(size=14),
        plot.caption=element_text(hjust=0,vjust=1),
        legend.position="none")+
  labs(title="Household debt",subtitle="Trillions of dollars",x="",y="",
       caption="@lenkiefer Source: FRBNY Consumer Credit Panel/Equifax")

plot of chunk 05-21-2017-plot-2-display

Inflation line chart

The CPI line chart compares price changes over a long horizon. Let’s focus on more recent changes. The code below generates a plot of monthly price inflation by category.

# stack education with full data
cpi7<-rbind(cpi6,cpi.ed,fill=T)

# construct monthly inflation metrics
cpi7 <- cpi7[order(series_id,date),]
cpi7<-cpi7[, cpi.inf:= value/shift(value,1,fill=NA)-1, by=series_id]

# plot data
ggplot(data=cpi7[ 
  #drop some items for clarity
  ! (item.name %in% c("Other goods and services","Services",
                      "Education and communication"))],
  aes(x=date,y=cpi.inf,color=item.name))+
  geom_line()+
  geom_hline(yintercept=0,linetype=2,color="darkgray")+
  scale_y_continuous(label=scales::percent)+
  theme_minimal()+   theme(legend.justification=c(0,0), legend.position="none")+
  scale_x_date(labels= date_format("%Y"),
               limits = as.Date(c('2000-01-01','2017-12-31')))+
  geom_text_repel(
    data = cpi7[date==dd[i] & 
                  ! (item.name %in% c("Other goods and services","Services",
                                      "Education and communication"))],
    aes(label = item.name), size = 3.5, nudge_x = 1) +
  labs(x="", y="Consumer Price Inflation (Monthly % change, SA)",
       title="Consumer Price Inflation (m/m %)",
       subtitle="by major category",
       caption="@lenkiefer Source: U.S. Bureau of Labor Statistics")+
  theme(plot.title=element_text(size=18),
        plot.caption=element_text(hjust=0,vjust=1),
        plot.margin=unit(c(0.25,0.25,0.25,0.25),"cm"))

plot of chunk 05-21-2017-plot-3-inflation

Wow, that’s pretty ugly, transportation is volatile and it’s hard to see what’s going on in the plot. Let’s use faceting to make a small multiple (also, let’s drop volatile transportation/commodities and recreation so we have 6 categories). Let’s also zoom in on trends since 2010.

# plot data
ggplot(data=cpi7[ 
  #drop some items for clarity
  ! (item.name %in% c("Other goods and services","Services",
                      "Education and communication", "Transportation",
                      "Commodities","Recreation"))],
  aes(x=date,y=cpi.inf,color=item.name))+
  geom_line()+
  geom_hline(yintercept=0,linetype=2,color="darkgray")+
  scale_y_continuous(label=scales::percent)+
  theme_minimal()+   theme(legend.justification=c(0,0), legend.position="none")+
  scale_x_date(labels= date_format("%Y"),
               limits = as.Date(c('2010-01-01','2017-12-31')))+
  facet_wrap(~item.name)+
  labs(x="", y="Consumer Price Inflation (monthly % change, SA)",
       title="Consumer Price Inflation (m/m %)",
       subtitle="by major category",
       caption="@lenkiefer Source: U.S. Bureau of Labor Statistics")+
  theme(plot.title=element_text(size=18),
        plot.caption=element_text(hjust=0,vjust=1),
        plot.margin=unit(c(0.25,0.25,0.25,0.25),"cm"))

plot of chunk 05-21-2017-plot-3-inflation-v2

Getting better. Still quite a bit of noise in some of the monthly measures. Let’s try looking at year-over-year percent changes to smooth out some of that noise.

# get 12 month inflation
cpi7<-cpi7[, cpi.inf12:= value/shift(value,12,fill=NA)-1, by=series_id]

# plot data
ggplot(data=cpi7[ 
  #drop some items for clarity
  ! (item.name %in% c("Other goods and services","Services",
                      "Education and communication", "Transportation",
                      "Commodities","Recreation"))],
  aes(x=date,y=cpi.inf12,color=item.name))+
  geom_line()+
  geom_hline(yintercept=0,linetype=2,color="darkgray")+
  scale_y_continuous(label=scales::percent)+
  theme_minimal()+   theme(legend.justification=c(0,0), legend.position="none")+
  scale_x_date(labels= date_format("%Y"),
               limits = as.Date(c('2010-01-01','2017-12-31')))+
  facet_wrap(~item.name)+
  labs(x="", y="Consumer Price Inflation (12-month % change, SA)",
       title="Consumer Price Inflation (y/y %)",
       subtitle="by major category",
       caption="@lenkiefer Source: U.S. Bureau of Labor Statistics")+
  theme(plot.title=element_text(size=18),
        plot.caption=element_text(hjust=0,vjust=1),
        plot.margin=unit(c(0.25,0.25,0.25,0.25),"cm"))

plot of chunk 05-21-2017-plot-3-inflation-v3

Now we have a clearer picture of recent trends. Overall inflation has been rising over the past year or so (thanks largely to recovering energy prices). Both Medical Care and Education are rising at a level well above general inflation, but seem to be trending down. Housing, which measures rents and owner’s equivalent rent not house prices (see for example question 2 here), is on the upswing.

Horizon charts

Let’s build some horizon charts (see my horizon chart post for everything you want, everything you need) to analyze inflation trends. Making horizon charts was only possible due to code from @timelyportfolio, lots of cool stuff over at his page.

# subset data
df.cpi<-cpi7[ year(date)>2009 &  ! 
                (item.name %in% c("Other goods and services",
                                  "Services","Education and communication",
                                  "Transportation","Recreation")), c("date","item.name","cpi.inf12"),
  with=F]
############# magic! ##############################################################

library(htmltools)
library(DT)
library(d3horizonR)


myf3<-function (s="Ohio"){
  d.out<- filter(df.cpi,item.name==s)$cpi.inf12
  return(d.out)
}
dt3 <- data.frame(item.name=unique(df.cpi$item.name))

dt4 <- dt3 %>%
  mutate(x = lapply(item.name, myf3 ) )  %>%
  mutate(x = lapply(x, function(dat) {
    d3horizon_chr(
      list(dat),
      options = d3horizonOptions(height=20),
      width = 400
    )
  }) )

m<-
datatable(
  dt4,
    caption = 'Annual growth in consumer prices',
  escape = FALSE,
  colnames=c("Item","12-month % change\n in prices"),
  options = list(
    columnDefs = list(list(width="400px", targets = 2:2)),
    fnDrawCallback = htmlwidgets::JS()
  )
) %>%
  tagList(htmlwidgets::getDependency("d3horizon", "d3horizonR")) %>%
  browsable()

# save m
library(htmlwidgets)
saveWidget(m, file="m.html")

These widgets are cool, but I’m not doing much with them here. We’ll do more later.

Lollipops

Lollipop charts have been under fire recently, but I still like them. See here for a more spirited defense of the chart type from Andy Cotgreave.

We’ll just go ahead and make one. If you don’t like them, the nice thing about ggplot2 is it’s pretty easy to just swap in a bar.

Let’s make one.

library(viridis) # for the colors!
ggplot(data=cpi7[date==dd[i] & item.name != "Education and communication",],
       aes(xend=100,x=cpi00,
           yend=reorder(item.name,cpi00),y=reorder(item.name,cpi00),
           color=cpi00,
           label=paste(" ",item.name," ")))+
    geom_segment(size=0.25,linetype=2)+geom_point(size=3)+
  scale_x_log10(limits=c(75,250),breaks=seq(100,250,25))+
  theme_minimal()+
  geom_text(aes(hjust=ifelse(cpi00>100,0,1)))+
  scale_color_viridis(name="Price Index",option="C",end=0.85)+
  labs(y="", x="Consumer Price Index (log scale, Jan 2000=100, SA)",
       title="Consumer Prices",
       subtitle=paste("by major category through",as.character(dd[i],format="%b-%Y")),
       caption="@lenkiefer Source: U.S. Bureau of Labor Statistics")+
  theme(plot.title=element_text(size=18),
        legend.position="none",
        axis.text.y=element_blank(),
        panel.grid.major.y=element_blank(),
        panel.grid.minor.y=element_blank(),
        plot.caption=element_text(hjust=0))

plot of chunk 05-21-2017-plot-lolly-1

This plot compares how much prices have risen relative to 2000. Essentially, this is just the endpoint of the line plot.

Let’s move over to the household debt data and look at state debt trends. In the same data file (different) worksheet we got from the New York Fed we can get a time series of per capita debt levels for the U.S. and some large states.

# data is range A3: BH10, we'll get the date values from row 3
# the row titles are in column A

df.state<-read_excel("data/HHD_C_Report_2017Q1.xlsx", 
               sheet = "Page 20 Data",  # name of sheet
               range="A3:BF15" )        # range where data lives

## we've got blank columns in positions 2 & 3 (drop them)
## then we'll gather the data and rename X__1 as "state"

df.state2<-df.state %>% gather(datec,pcdebt,c(2:58)) %>% rename(state=X__1)

#now create a date variable using the datec string:

df.state2<- df.state2 %>% 
  mutate(year=2000+as.numeric(substr(datec,1,2)),
         month=as.numeric(substr(datec,5,5))*3) %>%
  # create date variable
  mutate(date=as.Date(ISOdate(year,month,1))) %>% 
  # drop unneeede variables
  select(state,date,pcdebt)

#plot debt balance per capita

ggplot(data=df.state2,aes(x=date,y=pcdebt,color=state,
                          label=round(pcdebt,0)))+geom_line(size=1.2)+theme_minimal()+
  geom_text_repel(data=filter(df.state2,date=="2017-03-01"),
                  nudge_x=10,nudge_y=.025)+
    facet_wrap(~state,ncol=3)  + 
  scale_y_log10(limits=c(20,90),breaks=seq(25,90,15))+
    theme(plot.title=element_text(size=14))+theme(plot.caption=element_text(hjust=0,vjust=1,margin=margin(t=10)))+
    theme(plot.margin=unit(c(0.25,0.25,0.25,0.25),"cm"))+
      coord_cartesian(xlim=c(as.Date("2003-01-01"),as.Date("2017-12-31")))+
    theme(legend.position="none")+
    labs(title="Per capita household debt",
         subtitle="Thousands of dollars, (Based on the population with a credit report) ",
         x="",y="per capita debt (Ths $, log scale)",caption="@lenkiefer Source: FRBNY Consumer Credit Panel/Equifax")

plot of chunk 05-21-2017-data-3-state

Debt levels are starting to rise, but remain (on a per capita basis) below pre-Great Recession levels.

Let’s look at consumer delinquency trends by state. These data are in another worksheet in the same spreadsheet. Our mastery of a couple readxl tricks is paying off bigtime!

# data is range A3: BH10, we'll get the date values from row 3
# the row titles are in column A

df.statedq<-read_excel("data/HHD_C_Report_2017Q1.xlsx", 
               sheet = "Page 23 Data",  # name of sheet
               range="A3:BF15" )        # range where data lives

## we've got blank columns in positions 2 & 3 (drop them)
## then we'll gather the data and rename X__1 as "state"

df.state3<-df.statedq %>% gather(datec,dqdebt,c(2:58)) %>% rename(state=X__1)

#now create a date variable using the datec string:

df.state3<- df.state3 %>% 
  mutate(year=2000+as.numeric(substr(datec,1,2)),
         month=as.numeric(substr(datec,5,5))*3) %>%
  # create date variable
  mutate(date=as.Date(ISOdate(year,month,1))) %>% 
  # drop unneeede variables
  select(state,date,dqdebt)

#plot delinquency rates

ggplot(data=df.state3,aes(x=date,y=dqdebt,color=state,
                          label=round(dqdebt,1)))+
  geom_line(size=1.2)+theme_minimal()+
  geom_text_repel(data=filter(df.state3,date=="2017-03-01"),
                  nudge_x=10,nudge_y=.025)+
    facet_wrap(~state,ncol=3)  + 
     theme(plot.title=element_text(size=14),
           plot.caption=element_text(hjust=0))+
  coord_cartesian(xlim=c(as.Date("2003-01-01"),as.Date("2017-12-31")))+
  theme(legend.position="none")+
labs(title="Percent of Balance 90+ Days Late by State",
     subtitle="Based on the population with a credit report",
     x="",y="Percent of Balance 90+ Days Late",
     caption="@lenkiefer Source: FRBNY Consumer Credit Panel/Equifax")

plot of chunk 05-21-2017-data-4-state

After rising very high during the Great Recession and its aftermath, delinquency rates are coming down. Let’s try to rework this line chart as a dot chart.

State delinquency dot chart

First, we’ll compute the max and min delinquency rate for each state, and then plot the 2017Q1 value as a dot with markers to indicate the min/max. We’ll also add the U.S. state="ALL"" in our data as another reference. I’ve done these calculations before, usually the data.table way, but we’ll do it the dplyr way here.

df.state3b<-df.state3 %>% 
  # group by states
  group_by(state) %>%  
  # compute state specific min/max
  mutate(dq.max=max(dqdebt),
         dq.min=min(dqdebt)) %>% 
  # ungroup
  ungroup() %>%
  # compute US average, 
  group_by(date) %>%
  mutate(dq.US=dqdebt[state=="ALL"]) %>%
  ungroup() %>%
  # compute max and min
  mutate(dq.USmax=max(dqdebt[state=="ALL"]),
         dq.USmin=min(dqdebt[state=="ALL"])
         )

ggplot(data=filter(df.state3b,date=="2017-03-01"),
       aes(x=dqdebt,y=state,
           color=(dqdebt-dq.min)/(dq.max-dq.min),
           label=state))+
  geom_point(size=3)+
  geom_segment(aes(xend=dq.min,x=dq.max,yend=state),size=2,alpha=0.5)+
  scale_color_viridis(option="D",direction=-1,
                      name="Delinquency rate relative\nto state min/max\n1=max,0=min max",
                      end=0.95,limits=c(0,1))+
    theme_minimal()+
  theme(legend.position="top",
        plot.title=element_text(size=14),
        plot.caption=element_text(hjust=0),
        legend.key.width=unit(3,"line")) +
  geom_vline(aes(xintercept=dq.US),linetype=2,color="black")+
  labs(title="Percent of Balance 90+ Days Late by State: 2017Q1",
     subtitle="Based on the population with a credit report",
     y="",x="Percent of Balance 90+ Days Late",
     caption="@lenkiefer Source: FRBNY Consumer Credit Panel/Equifax\nlines extend from 2003Q1-2017Q1 (state specific) min to max, dotted line U.S. average")

plot of chunk 05-21-2017-data-5-state

Improving the plot with reordering

Ooh, I learned something else I should share. The data below are ordered according alphabetically, which isn’t the best way. Let’s reorder the data based on the current value. We can do it easily thanks to the forcats fct_reorder function .

ggplot(data=filter(df.state3b,date=="2017-03-01"),
       aes(x=dqdebt,
           # use fct_reorder, we want the US (state=="ALL") at the bottom,
           # so use ifelse() statement
           y=forcats::fct_reorder(state,ifelse(state=="ALL",-100,dqdebt)),
           color=(dqdebt-dq.min)/(dq.max-dq.min),
           label=state))+
  geom_point(size=3)+
  geom_segment(aes(xend=dq.min,x=dq.max,yend=state),size=2,alpha=0.5)+
  scale_color_viridis(option="D",direction=-1,
                      name="Delinquency rate relative\nto state min/max\n1=max,0=min max",
                      end=0.95,limits=c(0,1))+
    theme_minimal()+
  theme(legend.position="top",
        plot.title=element_text(size=14),
        plot.caption=element_text(hjust=0),
        legend.key.width=unit(3,"line")) +
  geom_vline(aes(xintercept=dq.US),linetype=2,color="black")+
  labs(title="Percent of Balance 90+ Days Late by State: 2017Q1",
     subtitle="Based on the population with a credit report",
     y="",x="Percent of Balance 90+ Days Late",
     caption="@lenkiefer Source: FRBNY Consumer Credit Panel/Equifax\nlines extend from 2003Q1-2017Q1 (state specific) min to max, dotted line U.S. average")

plot of chunk 05-21-2017-data-5-state-factor-reorder

In this plot we can compare more easily than with the small multiple lines. Notice that the range of variation is quite different depending on which state you are looking at. Nevada (NV) had delinquency rates reach over 20 percent. Ohio (OH) on the other hand, never saw delinquency rates over 10 percent. We can also see that while the level of delinquency has declined, it’s not at a minimum for any of the states tracked.

Movie versions

These data lend themself to animations.

Here’s some animated gifs of some of the plots:

Wrapping up

Time to wrap up for today. We refreshed some old charts and introduced a couple new ones. We did some data wrangling with data.table() and dplyr. We used readxl to reach into Excel files and get our data. And we tried out forcats to reorder labels in our chart. All this enabled us wrangle our data and get it ready to use ggplot2 to visualize trends in consumer prices and household debt.

Hope you found this useful. How could these code snippets and data visualizations work for you?