Data Wrangling

Beige-ian Statistics

Let’s pick up where we left off yesterday and do some more exploration with text mining. Like yesterday we’ll use the tidytext package for R. And we’ll lean heavily on Julie Silge and David Robinson’s Text Mining with R. Data We’ll turn again to the Federal Reserve for our text data. But today we’ll explore the Beige Book, which gathers anecdotal information on current economic conditions across the Federal Reserve Districts.

Text Mining Fedspeak

Textmining is an exciting topic. There is tremendous potential to gain insights from textual analysis. See for example Gentzko, Kelly and Taddy’s Text as Data. While text mining may be quite advanced in other fields, in finance and economics the application of these techniques is still in its infancy. In order to take advantage of text as data, economists and financial analysts need tools to help them. Fortunately, there is a great resource: Text Mining with R by Julia Silge (blog and on Twitter atjuliasilge) and David Robinson (blog and on Twitter atdrob).

Getting animated about new home sales

Indications are that U.S. housing market activity in the middle part of 2018 has moderated. Home sales estimates for both new home sales and existing home sales declined on a seasonally adjusted basis in June relative to May. House price growth has also moderated recently. Some folks have gotten animated about the recent trends. I’m more sanguine about the recent data. Certainly a slowdown in housing market activity would be cause for concern.

Exploring housing data with R and IPUMS USA

In this post I want to share some observations on housing in the United States from 1980 to 2016, share some R code for data wrangling, and tri (no that’s not a typo, just a pun) out a visualization techniques. Let’s get to it. I’ve been carrying a running conversation with folks on Twitter regarding the U.S. housing market and its future. Much of that depends on the evolution of demographic forces.

Plotting house price and income trends

In this post we will create some plots of house prices and incomes for the United States and individual states. We will also try out the bea.R package to get data from the U.S. Bureau of Economic Analysis. We’ll end up with something like this: Per usual we’ll do it with R and I’ll include code so you can follow along. Data We’re going to use two sources of data. First, we’ll get the FHFA house price index and then we’ll get per capita income estimates from the United States Bureau of Economic Analysis (BEA).

What's up? VSUP, that's what's up.

IN THIS POST WE SHALL EXPLORE VALUE-SUPRESSING UNCERTAINTY PALETTES. One of my favorite new sites is xenographics that gives examples of and links to “weird, but (sometimes) useful charts”. The examples xenographics gives are undoubtedly interesting and might help inspire you if you’re looking for something new. One new (to me) graphic was something called Value-Suppressing Uncertainty Palettes (VSUP). See this research paper (pdf). VSUPs “allocate larger ranges of a visual channel when uncertainty is low, and smaller ranges when uncertainty is high”.

Expanding Expansions, Contracting Recessions

IN THIS POST I WANT TO SHARE A GRAPH looking at the length of economic expansions and recessions in the United State over time. Earlier today, Andrew Chamberlain (on Twitter), observed that at the end of this month the current economic expansion in the U.S. would be the second longest in history. Let’s explore. In the United States, the National Bureau of Economic Research (NBER) dates expansions and recessions. See for example http://www.

quick geofacet plots

WHILE I WAS TRYING TO MAKE TIME FOR TIBBLETIME yesterday I got distracted and made this plot: hey! here's a #dataviz: tile plot of U.S. state unemployment pic.twitter.com/vH8fSu63ve — Leonard Kiefer (@lenkiefer) October 10, 2017 In this post, let’s go over how to make this plot with R. And we’re going to make it quick. Setup In order to create a plot like this we’ll need several packages, including the tidyverse, geofacet and the tidyquant package.

What time is it? Time for tibbletime!

WHAT TIME IS IT? TIME FOR TIBBLETIME! In this post, I’m going to take the tibbletime package out for a spin. Turns out this package is quite useful for things I tend to do. We’ll use the tibbletime package to write some R code to extend our ongoing analysis of trends in the U.S. mortgage market (see here for example). Davis Vaughan (on Twitter) one of the authors of the tibbletime package suggested I take a look:

Analyzing mortgage data with R

TIME FOR ANOTHER DATA WRANGLING AND VISUALIZATION EXTRAVAGANZA. This time we are going to work hard to turn some big data into little data. That is, we’re going to work hard to aggregate several million loan level records into useful summary graphics to tell us about the U.S. mortgage market in 2016. I’ve been working on a lot of different ways to visualize trends in the mortgage market (see here and here for examples).

PURRRty PowerPoint with R

IN THIS POST I WANT TO GO THROUGH SOME EXAMPLES of using the purrr package for R. Now there are already some great examples of how to use purrr. If you’re brand new to purrr (like I was not long ago) probably start with Jenny Bryan’s Purrr tutorial then see R for Data Science and also this presentation from rstudioconf (pdf). You can also check out this curated collection via Mara Averick (on Twitter: dataandme ) for a bunch of examples.

A (Tidyquant)um of solace

LET’S WRANGLE SOME HOUSING DATA. We’ll try something different with how posts are organized. In the past I have generally mixed data wrangling, R code and graphs all in one post. Now I’m going to break it up. Posts like yesterday will just show some data and discuss it. Then, if the data wrangling or code is complicated enough I’ll follow up with another post with details. You’ll be able to find all my posts on data wrangling, under the data wrangling tag and R code under the R tag.

Plotting recent house price trends with R

IT IS TIME FOR AN UPDATE ON HOUSE PRICE TRENDS AROUND THE UNITED STATES. I have been experimenting with some new visualizations and updating some old favorites. Let’s collect them here. This post will be an extension of my Visual Meditations on House Prices series from last year. Check out those posts for additional visualizations. Data We’ll use the recently updated Freddie Mac House Price Index (link to source) data and use R to create some plots.

Gather round and spread the word: Wrangling global house price data

IN THIS POST I WANT TO SHARE SOME R data wrangling strategy and use it to prepare an update to some global house price plots I shared last year. In last year’s post I did some data manipulation by hand and mouse in Excel before getting into R. In this post I’m going to use the newly updated readxl library to do the data manipulations entirely in R. If you follow along, then you should be able to use this code to recreate my graphs.

Plotting house price trends with FRED and R

IN THIS POST I AM going to share some useful code to create some custom plots using the St Louis Federal Reserve Economic Database (FRED). While the FRED page has some nice chart customization options, I’m going to import the data into R with the quantmod package and draw the plots. I find myself doing these types of things often enough that I thought you might find these bits o’ code useful.

Distribution of mortgage loan amounts in 2015

IN THIS POST I WANT TO CREATE some data visualizations with R using the recently released Home Mortgage Disclosure Act (HMDA) data. For this post I’m going to return to the 2015 HMDA that you can get from the Consumer Financial Protection Bureau (CFPB) webpage and I discussed earlier. Check out my prior post for more discussion of how we build these data visualizations. R code for graphs posted below

Industry-specific Beveridge Curves

IN MY PREVIOUS POST we looked at the Job Openings and Labor Turnover Survey (JOLTS) data and plotted a Beveridge Curve. In this post I want to add some more code that allows us to plot Beveridge Curves by industry. For more on the analysis of industry-specific Beveridge Curves, see this paper published in the June 2012 Monthly Labor Review that decomposes shifts in the Beveridge Curve and looks at it by industry.

JOLTS! Job openings and labor turnover trends

IN THIS POST WE’LL LOOK AT recent job openings and hires data from the Bureau of Labor Statistics Job Openings and Labor Turnover Survey (JOLTS). R code for selected graphs posted below Job openings and labor turnover Total nonfarm trends Let’s start by looking at aggregate national trends for total nonfarm sector. The plot below compares hires, job openings and separations (the sum of quits, layoffs and discharges, and other separations) over time.

What we spend: Consumer Expenditures in 2015

.showopt { background-color: #004c93; color: #FFFFFF; width: 100px; height: 20px; text-align: center; vertical-align: middle !important; float: right; font-family: sans-serif; border-radius: 8px; } .showopt:hover { background-color: #dfe4f2; color: #004c93; } pre.plot { background-color: white !important; } EARLIER THIS WEEK THE U.S. BUREAU OF LABOR STATISTICS released data on consumer expenditures in 2015. In this post I want to examine these data and make a few visualizations. R code for graphs posted below

Data swarms: Your firearms are useless against them!

AUGUST IS ALMOST OVER, and it’s nearly back to school season. And that means one thing. No, not that we’re about to get a chance to watch the #1 NCAA football program of all time dominate the gridiron (though that’s awesome too). No, it’s data release season! A data swarm is on its way. From American Community Survey to the American Housing Survey to the annual Home Mortgage Disclosure Act Data many statistical data releases come out in September and October.

Consumer Credit Trends

TODAY the NEW YORK FEDERAL RESERVE BANK released its Quarterly Report on Household Debt and Credit. These data come from the Center for Microeconomic Data based on credit records from Equifax. R code for the graphs are posted at bottom of page Trends in household debt balances One of the key statistics tracked in the report (full data can be found here) is household debt balances. They break debt balances out by loan type: