data wrangling

dcasting a data table

Been a while since I blogged here. Where does time go? On Twitter, I realized it’s just about time for spooky plots: U.S. existing home sales hit a seasonally adjusted annual rate of 6 million in August 2020, first time at 6 million since 2006 pic.twitter.com/4ZKrO2d0zN — 📈 Len Kiefer 📊 (@lenkiefer) September 22, 2020 Maybe too early? In this post, I want to share a simple R code pattern that’s been useful for me.

Tornado Charts

Earlier this week, I made a boss chart: https://t.co/6wf40jtqHI pic.twitter.com/xlv3Uzpiv0 — 📈 Len Kiefer 📊 (@lenkiefer) May 12, 2020 While listening to Chart Chat I heard Jeffrey Shaffer, Steve Wexler, Amanda Makulec, and Andy Cotgreave discuss tornado charts. I decided it might be a good idea to make one. Because I’m not sure I can trust with the awesome power inherent in these charts I won’t post R code here.

Real House Price Trends

Let us take a look at house price trends in the United States and across states and metro areas. Earlier this week I tweeted out a few charts on housing market trends. In most of the middle part of the country over the past 44 years there has been little growth in real (inflation-adjusted) house prices. In coastal states, a very different story. pic.twitter.com/PLbiNftha3 — 📈 Len Kiefer 📊 (@lenkiefer) July 10, 2019 In this post we’ll analyze real house prices since 1975, and per usual use R to wrangle data and make plots.

Beige-ian Statistics

Let’s pick up where we left off yesterday and do some more exploration with text mining. Like yesterday we’ll use the tidytext package for R. And we’ll lean heavily on Julie Silge and David Robinson’s Text Mining with R. Data We’ll turn again to the Federal Reserve for our text data. But today we’ll explore the Beige Book, which gathers anecdotal information on current economic conditions across the Federal Reserve Districts.

Text Mining Fedspeak

Textmining is an exciting topic. There is tremendous potential to gain insights from textual analysis. See for example Gentzko, Kelly and Taddy’s Text as Data. While text mining may be quite advanced in other fields, in finance and economics the application of these techniques is still in its infancy. In order to take advantage of text as data, economists and financial analysts need tools to help them. Fortunately, there is a great resource: Text Mining with R by Julia Silge (blog and on Twitter atjuliasilge) and David Robinson (blog and on Twitter atdrob).

Getting animated about new home sales

Indications are that U.S. housing market activity in the middle part of 2018 has moderated. Home sales estimates for both new home sales and existing home sales declined on a seasonally adjusted basis in June relative to May. House price growth has also moderated recently. Some folks have gotten animated about the recent trends. I’m more sanguine about the recent data. Certainly a slowdown in housing market activity would be cause for concern.

Exploring housing data with R and IPUMS USA

In this post I want to share some observations on housing in the United States from 1980 to 2016, share some R code for data wrangling, and tri (no that’s not a typo, just a pun) out a visualization techniques. Let’s get to it. I’ve been carrying a running conversation with folks on Twitter regarding the U.S. housing market and its future. Much of that depends on the evolution of demographic forces.

Plotting house price and income trends

In this post we will create some plots of house prices and incomes for the United States and individual states. We will also try out the bea.R package to get data from the U.S. Bureau of Economic Analysis. We’ll end up with something like this: Per usual we’ll do it with R and I’ll include code so you can follow along. Data We’re going to use two sources of data.

What's up? VSUP, that's what's up.

IN THIS POST WE SHALL EXPLORE VALUE-SUPRESSING UNCERTAINTY PALETTES. One of my favorite new sites is xenographics that gives examples of and links to “weird, but (sometimes) useful charts”. The examples xenographics gives are undoubtedly interesting and might help inspire you if you’re looking for something new. One new (to me) graphic was something called Value-Suppressing Uncertainty Palettes (VSUP). See this research paper (pdf). VSUPs “allocate larger ranges of a visual channel when uncertainty is low, and smaller ranges when uncertainty is high”.

Expanding Expansions, Contracting Recessions

IN THIS POST I WANT TO SHARE A GRAPH looking at the length of economic expansions and recessions in the United State over time. Earlier today, Andrew Chamberlain (on Twitter), observed that at the end of this month the current economic expansion in the U.S. would be the second longest in history. Let’s explore. In the United States, the National Bureau of Economic Research (NBER) dates expansions and recessions. See for example http://www.

quick geofacet plots

WHILE I WAS TRYING TO MAKE TIME FOR TIBBLETIME yesterday I got distracted and made this plot: hey! here's a #dataviz: tile plot of U.S. state unemployment pic.twitter.com/vH8fSu63ve — 📈 Len Kiefer 📊 (@lenkiefer) October 10, 2017 In this post, let’s go over how to make this plot with R. And we’re going to make it quick. Setup In order to create a plot like this we’ll need several packages, including the tidyverse, geofacet and the tidyquant package.

What time is it? Time for tibbletime!

WHAT TIME IS IT? TIME FOR TIBBLETIME! In this post, I’m going to take the tibbletime package out for a spin. Turns out this package is quite useful for things I tend to do. We’ll use the tibbletime package to write some R code to extend our ongoing analysis of trends in the U.S. mortgage market (see here for example). Davis Vaughan (on Twitter) one of the authors of the tibbletime package suggested I take a look:

Analyzing mortgage data with R

TIME FOR ANOTHER DATA WRANGLING AND VISUALIZATION EXTRAVAGANZA. This time we are going to work hard to turn some big data into little data. That is, we’re going to work hard to aggregate several million loan level records into useful summary graphics to tell us about the U.S. mortgage market in 2016. I’ve been working on a lot of different ways to visualize trends in the mortgage market (see here and here for examples).

PURRRty PowerPoint with R

IN THIS POST I WANT TO GO THROUGH SOME EXAMPLES of using the purrr package for R. Now there are already some great examples of how to use purrr. If you’re brand new to purrr (like I was not long ago) probably start with Jenny Bryan’s Purrr tutorial then see R for Data Science and also this presentation from rstudioconf (pdf). You can also check out this curated collection via Mara Averick (on Twitter: dataandme ) for a bunch of examples.

A (Tidyquant)um of solace

LET’S WRANGLE SOME HOUSING DATA. We’ll try something different with how posts are organized. In the past I have generally mixed data wrangling, R code and graphs all in one post. Now I’m going to break it up. Posts like yesterday will just show some data and discuss it. Then, if the data wrangling or code is complicated enough I’ll follow up with another post with details. You’ll be able to find all my posts on data wrangling, under the data wrangling tag and R code under the R tag.

Distribution of mortgage loan amounts in 2015

IN THIS POST I WANT TO CREATE some data visualizations with R using the recently released Home Mortgage Disclosure Act (HMDA) data. For this post I’m going to return to the 2015 HMDA that you can get from the Consumer Financial Protection Bureau (CFPB) webpage and I discussed earlier. Check out my prior post for more discussion of how we build these data visualizations. R code for graphs posted below

Industry-specific Beveridge Curves

IN MY PREVIOUS POST we looked at the Job Openings and Labor Turnover Survey (JOLTS) data and plotted a Beveridge Curve. In this post I want to add some more code that allows us to plot Beveridge Curves by industry. For more on the analysis of industry-specific Beveridge Curves, see this paper published in the June 2012 Monthly Labor Review that decomposes shifts in the Beveridge Curve and looks at it by industry.

JOLTS! Job openings and labor turnover trends

IN THIS POST WE’LL LOOK AT recent job openings and hires data from the Bureau of Labor Statistics Job Openings and Labor Turnover Survey (JOLTS). R code for selected graphs posted below Job openings and labor turnover Total nonfarm trends Let’s start by looking at aggregate national trends for total nonfarm sector. The plot below compares hires, job openings and separations (the sum of quits, layoffs and discharges, and other separations) over time.

What we spend: Consumer Expenditures in 2015

EARLIER THIS WEEK THE U.S. BUREAU OF LABOR STATISTICS released data on consumer expenditures in 2015. In this post I want to examine these data and make a few visualizations. R code for graphs posted below One area I pay close attention to is housing. Housing is the largest single category of expenditure, averaging about 1/3 of total consumer expenditures. The BLS breaks the data out by tenure, so we can see how expenditures vary by owners versus renters.

Data swarms: Your firearms are useless against them!

AUGUST IS ALMOST OVER, and it’s nearly back to school season. And that means one thing. No, not that we’re about to get a chance to watch the #1 NCAA football program of all time dominate the gridiron (though that’s awesome too). No, it’s data release season! A data swarm is on its way. From American Community Survey to the American Housing Survey to the annual Home Mortgage Disclosure Act Data many statistical data releases come out in September and October.

Consumer Credit Trends

TODAY the NEW YORK FEDERAL RESERVE BANK released its Quarterly Report on Household Debt and Credit. These data come from the Center for Microeconomic Data based on credit records from Equifax. R code for the graphs are posted at bottom of page Trends in household debt balances One of the key statistics tracked in the report (full data can be found here) is household debt balances. They break debt balances out by loan type: