2016/09/14

dataviz / economy / R / animation / data wrangling

IN MY PREVIOUS POST we looked at the Job Openings and Labor Turnover Survey (JOLTS) data and plotted a Beveridge Curve. In this post I want to add some more code that allows us to plot Beveridge Curves by industry.

For more on the analysis of industry-specific Beveridge Curves, see this paper published in the June 2012 Monthly Labor Review that decomposes shifts in the Beveridge Curve and looks at it by industry. Analyzing data through March 2012, the authors found that Construction alone shifted the total market Beveridge Curve by a full percentage point (see Table 4 in the paper).

The data

While the data is all available from the Bureau of Labor Statistics (BLS) webpage it required some looking through the available files to find the right series. In the handy code I post below, we can grab the right data from BLS and produce our plots.

We’re going to pull our data from two sources, the JOLTS data, and the Current Population Survey (CPS). You can access flat files from BLS here for JOLTS and here for CPS. Unfortunately the industry codes available in the flat files are not identical, but I hunted down (what I think are) the proper codes and combined them. You can read more here.

The relevant passages are quoted below

JOLTS

The Job Openings and Labor Turnover Survey (JOLTS) publishes industry estimates based on the 2012 North American Industry Classification System (NAICS). NAICS-based estimates are available for December 2000 to the present

CPS

The Current Population Survey currently uses the 2010 Census occupational classification and, beginning with data for January 2014, the 2012 Census industry classification. These classifications were derived from the 2010 Standard Occupational Classification (SOC) and the 2012 North American Industry Classification System (NAICS), respectively, to meet the special classification needs of demographic household surveys. The Census classifications use the same basic structure as the SOC and NAICS, but are generally less detailed.

A crosswalk between the codes are available here. For example, construction has the Census code 0770 while the corresponding NAICS code is 23.

I’m going to rely on the fact that the industry names are identical in the BLS data and merge on the industry names rather than use the crosswalk. You can confirm this works by comparing this table for JOLTS to this table for the unemployment rate from CPS.

Setup files

The code below merges the industry names and codes for JOLTS to (the same) industry name and (different) codes for CPS.

{% highlight r

load libraries

library(ggplot2) library(animation) library(stringr) library(data.table) library(tweenr) library(purrr) library(dplyr) library(viridis)

#get CPS codes

These are the major sectors in BLS table 14: http://www.bls.gov/news.release/empsit.t14.htm

#i saved these in a .txt file called incodelu.txt

#indy_code indy_text

0000 All Industries

0169 Agriculture, forestry, fishing, and hunting

0368 Nonagriculture industries

0369 Mining, quarrying, and oil and gas extraction

0569 Utilities

0770 Construction

1068 Nondurable goods manufacturing

2467 Manufacturing

2468 Durable goods manufacturing

4067 Wholesale and retail trade

4068 Wholesale trade

4669 Retail trade

6068 Transportation and utilities

6069 Transportation and warehousing

6468 Information

6469 Publishing, except Internet

6569 Motion pictures and sound recording industries

6670 Radio and television broadcasting and cable subscriptions programming

6679 Telecommunications

6769 Other information services

6867 Financial activities

6868 Finance and insurance

6869 Finance

7069 Real estate and rental and leasing

7268 Professional and business services

7269 Professional and technical services

7569 Management, administrative, and waste services

7858 Education and health services

7859 Educational services

8558 Leisure and hospitality

8559 Arts, entertainment, and recreation

8658 Accommodation and food services

8659 Accommodation

8679 Food services and drinking places

8767 Other services

8768 Other services, except private households

9290 Other services, private households

#read file: my.indy<-fread(“data/indcodeslu.txt”)

#Get JOLTS codes #saved in a file indcodesjolts.txt

industry_code industry_text display_level selectable sort_sequence blank

000000 Total nonfarm 0 T 1

100000 Total private 1 T 2

110099 Mining and logging 2 T 3

230000 Construction 2 T 4

300000 Manufacturing 2 T 5

320000 Durable goods manufacturing 3 T 6

340000 Nondurable goods manufacturing 3 T 7

400000 Trade, transportation, and utilities 2 T 8

420000 Wholesale trade 3 T 9

440000 Retail trade 3 T 10

480099 Transportation, warehousing, and utilities 3 T 11

510000 Information 2 T 12

510099 Financial activities 2 T 13

520000 Finance and insurance 3 T 14

530000 Real estate and rental and leasing 3 T 15

540099 Professional and business services 2 T 16

600000 Education and health services 2 T 17

610000 Educational services 3 T 18

700000 Leisure and hospitality 2 T 20

710000 Arts, entertainment, and recreation 3 T 21

720000 Accommodation and food services 3 T 22

810000 Other services 2 T 23

900000 Government 1 T 24

910000 Federal 2 T 25

920000 State and local 2 T 26

923000 State and local government education 3 T 27

929000 State and local government, excluding education 3 T 28

my.indy2<-fread(“data/indcodesjolts.txt”)

#merge together industry names my.indy3<-merge(my.indy2,my.indy,by.x=“industry_text”,by.y=“indy_text”) {% endhighlight

Prepare data

Now that we have the codes we can read the data from the BLS and use the industry codes to merge the data.

{% highlight r

read in unemployment rated

ln.series<-fread(“http://download.bls.gov/pub/time.series/ln/ln.series") ln.data<-fread(“http://download.bls.gov/pub/time.series/ln/ln.data.1.AllData") ln.indy<-fread(“http://download.bls.gov/pub/time.series/ln/ln.indy")

find series named unemployment rate:

my.series<-ln.series[grepl(“Unemployment Rate”,series_title) & indy_code !=0,] my.series<-ln.series[(grepl(“Unemployment Rate”,series_title) & indy_code !=0 & indy_code %in% my.indy3$indy_code & ages_code==0 & periodicity_code==“M” & sexs_code==0) | series_id==“LNU04000000”,]

ln.data2<-ln.data[year>1999 & series_id %in% my.series$series_id,] ln.data2<-merge(ln.data2,my.series[,list(series_id,indy_code)],by=“series_id”,all.x=T) ln.data2 <-merge(ln.data2,my.indy3[,list(indy_code,industry_text),],by=“indy_code”,all.x=T) ln.data2<-dplyr::rename(ln.data2,ur=value)

#get jolts data

jolts.dt<-fread(“http://download.bls.gov/pub/time.series/jt/jt.data.1.AllItems") jolts.series<-fread(“http://download.bls.gov/pub/time.series/jt/jt.series") jolts.ind<-fread(“http://download.bls.gov/pub/time.series/jt/jt.industry", col.names=c(“industry_code”,“industry_text”, “display_level”, “selectable”,“sort_sequence”,“blank”)) jolts.element<-fread(“http://download.bls.gov/pub/time.series/jt/jt.dataelement", col.names=c(“dataelement_code”,“dataelement_text”,“display_level”,“selectable”,“sort_sequence”,“blank” ))

#we want job openeings: dataelement=JO, not seasonally adjusted, rates (ratelevel_code=R) and U.S. (region_code=00)

we also want the aggregate series, whos id is JTU00000000JOR (I found it manually)

my.series<-jolts.series[( industry_code %in% my.indy3$industry_code & dataelement_code==“JO” & seasonal==“U” & ratelevel_code==“R” & region_code==“00”) | series_id==“JTU00000000JOR”, ]

jolts.dt2<-jolts.dt[series_id %in% my.series$series_id,] jolts.dt2<-merge(jolts.dt2,my.series[,list(series_id,industry_code),],by=“series_id”) jolts.dt2 <-merge(jolts.dt2,my.indy3[,list(industry_code,industry_text),],by=“industry_code”,all.x=T) jolts.dt2<-dplyr::rename(jolts.dt2,jo=value)

all.dt<-merge(ln.data2[,list(year,period,ur,industry_text,indy_code)], jolts.dt2[,list(year,period,jo,industry_text,industry_code)], by=c(“year”,“period”,“industry_text”))

merge data:

all.dt$ur<-as.numeric(all.dt$ur) all.dt[industry_code==0,industry_text:=“All Industries”] all.dt[,month:=as.numeric(substr(period,2,3))] all.dt[,date:= as.Date(ISOdate(year,month,1))]

#We want to distinguish between recession and expansions using NBER recession dates

#Turning Point Date Peak or Trough Announcement Date with Link #June 2009 Trough September 20, 2010 #December 2007 Peak December 1, 2008 #March 2001 Peak November 26, 2001

#create recessions

all.dt[, recession:=“Expansion”] all.dt[ (date>“2001-02-28” & date<=“2001-12-01”) | (date>“2007-10-31” & date<=“2009-06-30”), recession:=“Recession” ] {% endhighlight

Check Data

Now we should have data in a format that we can use.

{% highlight r library(htmlTable) #make a table htmlTable(all.dt[ date==“2016-07-01”, list(year,period,industry_text,indy_code,industry_code,ur,jo)], col.rgroup = c(“none”, “#F7F7F7”),caption=“Merged Table”, header = c(“Year “,“Period “,“industry”,“Industry\ncode (CPS)”,“Industry\ncode (JOLTS)”,“Unemployment\nRate “,“Job Openings\nRate”), tfoot=“Source: U.S. Bureau of Labor Statistics\ndata are not seasonally adjusted”) {% endhighlight

Create Panel Plot

Now we can create a panel plot.

{% highlight r

turn industry text into factors so we can get All industries in first position

all.dt$ind.textf<-factor(all.dt$industry_text,levels=unique(all.dt$industry_text))

ggplot(data=all.dt,aes(x=ur,y=jo,color=recession))+geom_point(alpha=0.25)+facet_wrap(~ind.textf,ncol=4)+ theme_minimal()+ theme(plot.caption=element_text(hjust=0,size=7))+ #put the legend down at the bottom right theme(legend.position = c(0.85, 0.075) )+ #Let’s circle the last available point: geom_point(data=all.dt[ date==“2016-07-01”],shape=21,size=3,color=viridis(3)[1])+ geom_text(data=all.dt[ date==“2016-07-01”],size=2,color=viridis(3)[1],label=“July, 2016\n\n”)+ theme(strip.text.x = element_text(size = 7))+ scale_color_manual(name=“Recession or\nExpansion”,values=c(viridis(5)[4],magma(5)[3]))+ labs(x=“Unemployment Rate (%, NSA)",y=“Job Openings Rate (%, NSA)”, title=“The Beveridge Curve”, subtitle=“job openings rate vs unemployment rate (NSA)”, caption="@lenkiefer Source: U.S. Bureau of Labor Statistics, Current Population Survey and Job Openings and Labor Turnover Survey”) {% endhighlight

plot of chunk fig-bv-grahp1

Add animation

If we wish, we can add animation through the following code.

See my earlier post about tweenr for an introduction to tweenr, and more examples here and here.

{% highlight r #make a function to prepare data for tweening myf<-function(m){ DT<-copy(all.dt) DT<-DT[industry_text==m,] DT %>% map_if(is.character, as.factor) %>% as.data.frame() ->DT as.data.frame(DT)}

ind.list<-unique(all.dt$industry_text) # get list of industries

#the animation will loop through each industry and return to All Industries (ind.list[1]). my.list<-lapply(c(ind.list,ind.list[1]),myf)

tf <- tween_states(my.list, tweenlength= 2, statelength=3, ease=rep(‘cubic-in-out’,17),nframes=300) tf<-data.table(tf)
N<-max(tf$.frame)

oopt = ani.options(interval = 0.2) saveGIF({for (i in 1:N) { g<- ggplot(data=tf[.frame==i,],aes(x=ur,y=jo,color=recession,group=recession))+geom_point(alpha=0.65)+facet_wrap(~industry_text)+ theme_minimal()+ theme(plot.caption=element_text(hjust=0,size=9))+ geom_point(data=tf[.frame==i & date==“2016-07-01”],shape=21,size=3,color=viridis(3)[1])+ geom_text(data=tf[.frame==i & date==“2016-07-01”],color=viridis(3)[1],label=“July, 2016\n\n”)+ coord_cartesian(ylim=c(0,8),xlim=c(0,28))+ theme(legend.position=“bottom”)+ scale_color_manual(name=“Recession or Expansion”,values=c(viridis(5)[4],magma(5)[3]))+ labs(x=“Unemployment Rate (%, NSA)",y=“Job Openings Rate (%, NSA)”, title=“The Beveridge Curve”, subtitle=“job openings rate vs unemployment rate (NSA)”, caption="@lenkiefer Source: U.S. Bureau of Labor Statistics, Current Population Survey and Job Openings and Labor Turnover Survey”)

print(g) ani.pause() print(i) } },movie.name=“bv tween3.gif”,ani.width = 650, ani.height = 500) {% endhighlight

Industry-specific Beveridge Curves

The data

JOLTS

CPS

Setup files

load libraries

These are the major sectors in BLS table 14: http://www.bls.gov/news.release/empsit.t14.htm

0000 All Industries

0169 Agriculture, forestry, fishing, and hunting

0368 Nonagriculture industries

0369 Mining, quarrying, and oil and gas extraction

0569 Utilities

0770 Construction

1068 Nondurable goods manufacturing

2467 Manufacturing

2468 Durable goods manufacturing

4067 Wholesale and retail trade

4068 Wholesale trade

4669 Retail trade

6068 Transportation and utilities

6069 Transportation and warehousing

6468 Information

6469 Publishing, except Internet

6569 Motion pictures and sound recording industries

6670 Radio and television broadcasting and cable subscriptions programming

6679 Telecommunications

6769 Other information services

6867 Financial activities

6868 Finance and insurance

6869 Finance

6990 Insurance carriers and related activities

7069 Real estate and rental and leasing

7268 Professional and business services

7269 Professional and technical services

7569 Management, administrative, and waste services

7858 Education and health services

7859 Educational services

7968 Health care and social assistance

8558 Leisure and hospitality

8559 Arts, entertainment, and recreation

8658 Accommodation and food services

8659 Accommodation

8679 Food services and drinking places

8767 Other services

8768 Other services, except private households

9290 Other services, private households

industry_code industry_text display_level selectable sort_sequence blank

000000 Total nonfarm 0 T 1

100000 Total private 1 T 2

110099 Mining and logging 2 T 3

230000 Construction 2 T 4

300000 Manufacturing 2 T 5

320000 Durable goods manufacturing 3 T 6

340000 Nondurable goods manufacturing 3 T 7

400000 Trade, transportation, and utilities 2 T 8

420000 Wholesale trade 3 T 9

440000 Retail trade 3 T 10

480099 Transportation, warehousing, and utilities 3 T 11

510000 Information 2 T 12

510099 Financial activities 2 T 13

520000 Finance and insurance 3 T 14

530000 Real estate and rental and leasing 3 T 15

540099 Professional and business services 2 T 16

600000 Education and health services 2 T 17

610000 Educational services 3 T 18

620000 Health care and social assistance 3 T 19

700000 Leisure and hospitality 2 T 20

710000 Arts, entertainment, and recreation 3 T 21

720000 Accommodation and food services 3 T 22

810000 Other services 2 T 23

900000 Government 1 T 24

910000 Federal 2 T 25

920000 State and local 2 T 26

923000 State and local government education 3 T 27

929000 State and local government, excluding education 3 T 28

Prepare data

read in unemployment rated

find series named unemployment rate:

we also want the aggregate series, whos id is JTU00000000JOR (I found it manually)

merge data: