14 September 2016

IN MY PREVIOUS POST we looked at the Job Openings and Labor Turnover Survey (JOLTS) data and plotted a Beveridge Curve. In this post I want to add some more code that allows us to plot Beveridge Curves by industry.

For more on the analysis of industry-specific Beveridge Curves, see this paper published in the June 2012 Monthly Labor Review that decomposes shifts in the Beveridge Curve and looks at it by industry. Analyzing data through March 2012, the authors found that Construction alone shifted the total market Beveridge Curve by a full percentage point (see Table 4 in the paper).

The data

While the data is all available from the Bureau of Labor Statistics (BLS) webpage it required some looking through the available files to find the right series. In the handy code I post below, we can grab the right data from BLS and produce our plots.

We’re going to pull our data from two sources, the JOLTS data, and the Current Population Survey (CPS). You can access flat files from BLS here for JOLTS and here for CPS. Unfortunately the industry codes available in the flat files are not identical, but I hunted down (what I think are) the proper codes and combined them. You can read more here.

The relevant passages are quoted below

JOLTS

The Job Openings and Labor Turnover Survey (JOLTS) publishes industry estimates based on the 2012 North American Industry Classification System (NAICS). NAICS-based estimates are available for December 2000 to the present

CPS

The Current Population Survey currently uses the 2010 Census occupational classification and, beginning with data for January 2014, the 2012 Census industry classification. These classifications were derived from the 2010 Standard Occupational Classification (SOC) and the 2012 North American Industry Classification System (NAICS), respectively, to meet the special classification needs of demographic household surveys. The Census classifications use the same basic structure as the SOC and NAICS, but are generally less detailed.

A crosswalk between the codes are available here. For example, construction has the Census code 0770 while the corresponding NAICS code is 23.

I’m going to rely on the fact that the industry names are identical in the BLS data and merge on the industry names rather than use the crosswalk. You can confirm this works by comparing this table for JOLTS to this table for the unemployment rate from CPS.

Setup files

The code below merges the industry names and codes for JOLTS to (the same) industry name and (different) codes for CPS.

# load libraries
library(ggplot2)
library(animation)
library(stringr)
library(data.table)
library(tweenr)
library(purrr)
library(dplyr)
library(viridis)

#get CPS codes
# These are the major sectors in BLS table 14: http://www.bls.gov/news.release/empsit.t14.htm

#i saved these in a .txt file called incodelu.txt

#indy_code	indy_text
# 0000	All Industries	
# 0169	Agriculture, forestry, fishing, and hunting	
# 0368	Nonagriculture industries	
# 0369	Mining, quarrying, and oil and gas extraction	
# 0569	Utilities	
# 0770	Construction	
# 1068	Nondurable goods manufacturing	
# 2467	Manufacturing	
# 2468	Durable goods manufacturing	
# 4067	Wholesale and retail trade	
# 4068	Wholesale trade	
# 4669	Retail trade	
# 6068	Transportation and utilities	
# 6069	Transportation and warehousing	
# 6468	Information	
# 6469	Publishing, except Internet	
# 6569	Motion pictures and sound recording industries	
# 6670	Radio and television broadcasting and cable subscriptions programming	
# 6679	Telecommunications	
# 6769	Other information services	
# 6867	Financial activities	
# 6868	Finance and insurance	
# 6869	Finance	
# 6990	Insurance carriers and related activities	
# 7069	Real estate and rental and leasing	
# 7268	Professional and business services	
# 7269	Professional and technical services	
# 7569	Management, administrative, and waste services	
# 7858	Education and health services	
# 7859	Educational services	
# 7968	Health care and social assistance	
# 8558	Leisure and hospitality	
# 8559	Arts, entertainment, and recreation	
# 8658	Accommodation and food services	
# 8659	Accommodation	
# 8679	Food services and drinking places	
# 8767	Other services	
# 8768	Other services, except private households	
# 9290	Other services, private households

#read file:
my.indy<-fread("data/indcodeslu.txt")

#Get JOLTS codes 
#saved in a file  indcodesjolts.txt

# industry_code	industry_text	display_level	selectable	sort_sequence	blank
# 000000	Total nonfarm	0	T	1	
# 100000	Total private	1	T	2	
# 110099	Mining and logging	2	T	3	
# 230000	Construction	2	T	4	
# 300000	Manufacturing	2	T	5	
# 320000	Durable goods manufacturing	3	T	6	
# 340000	Nondurable goods manufacturing	3	T	7	
# 400000	Trade, transportation, and utilities	2	T	8	
# 420000	Wholesale trade	3	T	9	
# 440000	Retail trade	3	T	10	
# 480099	Transportation, warehousing, and utilities	3	T	11	
# 510000	Information	2	T	12	
# 510099	Financial activities	2	T	13	
# 520000	Finance and insurance	3	T	14	
# 530000	Real estate and rental and leasing	3	T	15	
# 540099	Professional and business services	2	T	16	
# 600000	Education and health services	2	T	17	
# 610000	Educational services	3	T	18	
# 620000	Health care and social assistance	3	T	19	
# 700000	Leisure and hospitality	2	T	20	
# 710000	Arts, entertainment, and recreation	3	T	21	
# 720000	Accommodation and food services	3	T	22	
# 810000	Other services	2	T	23	
# 900000	Government	1	T	24	
# 910000	Federal	2	T	25	
# 920000	State and local	2	T	26	
# 923000	State and local government education	3	T	27	
# 929000	State and local government, excluding education	3	T	28	

my.indy2<-fread("data/indcodesjolts.txt")

#merge together industry names
my.indy3<-merge(my.indy2,my.indy,by.x="industry_text",by.y="indy_text")

Prepare data

Now that we have the codes we can read the data from the BLS and use the industry codes to merge the data.

# read in unemployment rated
ln.series<-fread("http://download.bls.gov/pub/time.series/ln/ln.series")
ln.data<-fread("http://download.bls.gov/pub/time.series/ln/ln.data.1.AllData")
ln.indy<-fread("http://download.bls.gov/pub/time.series/ln/ln.indy")

# find series named unemployment rate:
my.series<-ln.series[grepl("Unemployment Rate",series_title) & indy_code !=0,]
my.series<-ln.series[(grepl("Unemployment Rate",series_title) & indy_code !=0 & indy_code %in% my.indy3$indy_code
                     & ages_code==0 & periodicity_code=="M" &  sexs_code==0) | series_id=="LNU04000000",]

ln.data2<-ln.data[year>1999 & series_id %in% my.series$series_id,]
ln.data2<-merge(ln.data2,my.series[,list(series_id,indy_code)],by="series_id",all.x=T)
ln.data2 <-merge(ln.data2,my.indy3[,list(indy_code,industry_text),],by="indy_code",all.x=T)
ln.data2<-dplyr::rename(ln.data2,ur=value)

#get jolts data

jolts.dt<-fread("http://download.bls.gov/pub/time.series/jt/jt.data.1.AllItems")
jolts.series<-fread("http://download.bls.gov/pub/time.series/jt/jt.series")
jolts.ind<-fread("http://download.bls.gov/pub/time.series/jt/jt.industry",
                 col.names=c("industry_code","industry_text",	"display_level",	"selectable","sort_sequence","blank"))
jolts.element<-fread("http://download.bls.gov/pub/time.series/jt/jt.dataelement",
                     col.names=c("dataelement_code","dataelement_text","display_level","selectable","sort_sequence","blank"                     ))

#we want job openeings: dataelement=JO, not seasonally adjusted, rates (ratelevel_code=R) and U.S. (region_code=00)
# we also want the aggregate series, whos id is JTU00000000JOR (I found it manually)

my.series<-jolts.series[( industry_code %in% my.indy3$industry_code & 
                            dataelement_code=="JO" &
                            seasonal=="U"  & ratelevel_code=="R" & region_code=="00") | series_id=="JTU00000000JOR", ]

jolts.dt2<-jolts.dt[series_id %in% my.series$series_id,]
jolts.dt2<-merge(jolts.dt2,my.series[,list(series_id,industry_code),],by="series_id")
jolts.dt2 <-merge(jolts.dt2,my.indy3[,list(industry_code,industry_text),],by="industry_code",all.x=T)
jolts.dt2<-dplyr::rename(jolts.dt2,jo=value)


all.dt<-merge(ln.data2[,list(year,period,ur,industry_text,indy_code)],
              jolts.dt2[,list(year,period,jo,industry_text,industry_code)],
              by=c("year","period","industry_text"))

# merge data:
all.dt$ur<-as.numeric(all.dt$ur)
all.dt[industry_code==0,industry_text:="All Industries"]
all.dt[,month:=as.numeric(substr(period,2,3))]
all.dt[,date:= as.Date(ISOdate(year,month,1))]

#We want to distinguish between recession and expansions using NBER recession dates

#Turning Point Date	Peak or Trough	Announcement Date with Link
#June 2009	Trough	September 20, 2010
#December 2007	Peak	December 1, 2008
#March 2001	Peak	November 26, 2001

#create recessions

all.dt[, recession:="Expansion"]
all.dt[ (date>"2001-02-28" & date<="2001-12-01") |
        (date>"2007-10-31" & date<="2009-06-30"),
        recession:="Recession" ]

Check Data

Now we should have data in a format that we can use.

library(htmlTable) #make a table
htmlTable(all.dt[ date=="2016-07-01", list(year,period,industry_text,indy_code,industry_code,ur,jo)], col.rgroup = c("none", "#F7F7F7"),caption="Merged Table",
           header = c("Year ","Period ","industry","Industry\ncode (CPS)","Industry\ncode (JOLTS)","Unemployment\nRate ","Job Openings\nRate"),
          tfoot="Source: U.S. Bureau of Labor Statistics\ndata are not seasonally adjusted")
Merged Table
Year Period industry Industry code (CPS) Industry code (JOLTS) Unemployment Rate Job Openings Rate
1 2016 M07 All Industries 0 0 5.1 4.1
2 2016 M07 Accommodation and food services 8658 720000 6.2 4.8
3 2016 M07 Arts, entertainment, and recreation 8559 710000 5.2 3.7
4 2016 M07 Construction 770 230000 4.5 3
5 2016 M07 Durable goods manufacturing 2468 320000 4.6 2.9
6 2016 M07 Education and health services 7858 600000 3.7 4.8
7 2016 M07 Educational services 7859 610000 5.7 3.5
8 2016 M07 Finance and insurance 6868 520000 2.3 3.9
9 2016 M07 Financial activities 6867 510099 2.4 3.8
10 2016 M07 Health care and social assistance 7968 620000 3.2 5
11 2016 M07 Information 6468 510000 5.7 2.7
12 2016 M07 Leisure and hospitality 8558 700000 6 4.7
13 2016 M07 Manufacturing 2467 300000 4.3 3
14 2016 M07 Nondurable goods manufacturing 1068 340000 3.6 3.2
15 2016 M07 Other services 8767 810000 4.4 3.5
16 2016 M07 Professional and business services 7268 540099 4.7 5.9
17 2016 M07 Real estate and rental and leasing 7069 530000 2.7 3.3
18 2016 M07 Retail trade 4669 440000 5.5 4.2
19 2016 M07 Wholesale trade 4068 420000 3.9 2.7
Source: U.S. Bureau of Labor Statistics
data are not seasonally adjusted

Create Panel Plot

Now we can create a panel plot.

# turn industry text into factors so we can get All industries in first position
all.dt$ind.textf<-factor(all.dt$industry_text,levels=unique(all.dt$industry_text))

ggplot(data=all.dt,aes(x=ur,y=jo,color=recession))+geom_point(alpha=0.25)+facet_wrap(~ind.textf,ncol=4)+
  theme_minimal()+
  theme(plot.caption=element_text(hjust=0,size=7))+
  #put the legend down at the bottom right
  theme(legend.position = c(0.85, 0.075) )+
  #Let's circle the last available point:
  geom_point(data=all.dt[ date=="2016-07-01"],shape=21,size=3,color=viridis(3)[1])+
  geom_text(data=all.dt[  date=="2016-07-01"],size=2,color=viridis(3)[1],label="July, 2016\n\n")+
   theme(strip.text.x = element_text(size = 7))+
  scale_color_manual(name="Recession or\nExpansion",values=c(viridis(5)[4],magma(5)[3]))+
  labs(x="Unemployment Rate (%, NSA)",y="Job Openings Rate (%, NSA)",
       title="The Beveridge Curve",
       subtitle="job openings rate vs unemployment rate (NSA)",
       caption="@lenkiefer Source: U.S. Bureau of Labor Statistics, Current Population Survey and Job Openings and Labor Turnover Survey")

plot of chunk fig-bv-grahp1

Add animation

If we wish, we can add animation through the following code.

See my earlier post about tweenr for an introduction to tweenr, and more examples here and here.

#make a function to prepare data for tweening
myf<-function(m){
  DT<-copy(all.dt)
  DT<-DT[industry_text==m,]
  DT %>% map_if(is.character, as.factor) %>% as.data.frame() ->DT 
  as.data.frame(DT)}

ind.list<-unique(all.dt$industry_text) # get list of industries

#the animation will loop through each industry and return to All Industries (ind.list[1]).
my.list<-lapply(c(ind.list,ind.list[1]),myf)  

tf <- tween_states(my.list, tweenlength= 2, statelength=3, ease=rep('cubic-in-out',17),nframes=300)
tf<-data.table(tf)  
N<-max(tf$.frame)

oopt = ani.options(interval = 0.2)
saveGIF({for (i in 1:N) {
  g<-
    ggplot(data=tf[.frame==i,],aes(x=ur,y=jo,color=recession,group=recession))+geom_point(alpha=0.65)+facet_wrap(~industry_text)+
    theme_minimal()+
    theme(plot.caption=element_text(hjust=0,size=9))+
    geom_point(data=tf[.frame==i & date=="2016-07-01"],shape=21,size=3,color=viridis(3)[1])+
    geom_text(data=tf[.frame==i & date=="2016-07-01"],color=viridis(3)[1],label="July, 2016\n\n")+
    coord_cartesian(ylim=c(0,8),xlim=c(0,28))+
    theme(legend.position="bottom")+
    scale_color_manual(name="Recession or Expansion",values=c(viridis(5)[4],magma(5)[3]))+
    labs(x="Unemployment Rate (%, NSA)",y="Job Openings Rate (%, NSA)",
         title="The Beveridge Curve",
         subtitle="job openings rate vs unemployment rate (NSA)",
         caption="@lenkiefer Source: U.S. Bureau of Labor Statistics, Current Population Survey and Job Openings and Labor Turnover Survey")
  
  print(g)
  ani.pause()
  print(i) }
},movie.name="bv tween3.gif",ani.width = 650, ani.height = 500)

animated beveridge curve