I have been recently messing around with the new ggfx package.
using #rstats ggfx::with_bloom and ggridges::geom_density left with ggfx, right without pic.twitter.com/L8yknjAJVw
— š Len Kiefer š (@lenkiefer) March 4, 2021 Most of my applications (see below for a gallery) have maybe not been applying good dataviz guidelines. But I think I have found a good example.
We can use the ggfx::with_blend function to layer a recession indicator with a time series and color code the lines.
Earlier today I tweeted out a chart of house prices using my inari color theme.
House price growth lifting off pic.twitter.com/TvNJcOZrTF
— š Len Kiefer š (@lenkiefer) January 26, 2021 Below is the R code to generate the plot.
# load libraries
library(tidyquant)
library(tidyverse)
# list of FRED Tickers
tickers<- c("LXXRSA","SPCS20RSA","LVXRSA","SEXRSA",
"SFXRSA","NYXRSA","BOXRSA","SDXRSA","CHXRSA",
"DNXRSA","PHXRNSA","DAXRNSA","WDXRSA",
"ATXRNSA","MIXRNSA","POXRSA","MNXRSA","DEXRNSA","TPXRSA","CRXRSA","CEXRSA")
# list of city names
cities <- c("Los Angeles","20-city","Las Vegas","Seattle",
"San Francisco","New York","Boston","San Diego","Chicago",
"Denver","Phoenix","Dallas","Washington DC",
"Atlanta","Miami","Portland","Minneapolis","Detroit","Tampa","Charlotte","Cleveland")
df.
Seasonally adjusted greetings to you and yours. For you I have an animated chart, a variation on our rate cloud with a wintry theme.
R code below.
Weāll grab mortgage rate data, make a few new variables and then plot the chart using ggridges::geom_density_ridges2. Using the raincloud option for the poisition argument in ggridges::geom_density_ridges2 places the individual data points below the density plots. Using various shades of white we can turn the rain cloud into a snow cloud.
VISUAL MEDITATIONS are the analysis of repeated graphs of the same data with variations on a graphical theme. When altering the mapping of data to aesthetics sometimes interesting patterns emerge. I find it a useful practice. I made a series of these a few years ago with different charts. The chart images have been lost to past blog migrations, but the code should still work.
In this post, I want to consider several alternative ways to visualize house prices.
Recently Iāve been putting my y-axis labels on the right for some time series.
I think this idea has been rattling around in my head since it was suggested on Twitter by Maarten Lambrechts:
A y-axis on the left is almost always the default, while the most recent and usually most relevant data are on the right. Then shouldn't a y-axis on the right be the default (when there are no data labels), to improve legibility?
Yesterday I shared with you observations on the economy, which form the core of many of my recent economic outlook talks. In that article I used some charts with alternative formatting. No not spooky, but a blue theme kind of like those alternative road uniforms some sportsball teams wear.
Here, I will share with you the R code for these delicious plots.
Setup
First weāll need to set up our chart theme, tweak some ggplot2 defaults and load some libraries.
Today the FHFA released their house price index for August 2020. Per the report house prices in August 2020 increased 1.5 percentage points over the prior month (19.6% at an annualized rate). Over the last 12-months US house prices have increased 8%, and over the last 3-months they have risen over 15% on an annualized basis. That is an acceleration of over 7.5 percentage points, the largest turnaround in house price growth since the inflection point in 2009.
Earlier today I tweeted out an update of our skyline mortgage rate chart. Gray Kimbrough (follow him on Twitter to get all your Millennial Myths busted) pointed out that my chart style was close to famous art style of Joy Divisionās 1979 album Unknown Pleasures.
That was exactly right, because I arrived at the skyline mortgage rate chart by tweaking my original application of āggridges::geom_density_ridgesā (formerly known as Joyplot) code to arrive there.
Letās update our earlier analysis to examine the Federal Reserveās Beige Book.
Following my earlier post, we can construct a sentiment measure for each report. It turns out that after turning sharply negative in spring, the October 2020 report returned to positive territory.
The sentiment index looks at all words and after adjusting for economics terms (like gross) we score them for sentiment. We could just count up the number of times we see words like āstongā vs words like āweakā.
About a year ago I made a chart and Bob Rudis dubbed it a skyline chart. Hereās an updated version I made today.
The chart shows a historgram over US weekly average 30-year mortgage rates by year broken into 1/8 percentage points buckets. We see that through this past week, 30-year mortgage rates had spend eight weeks around 2.875 percent.
R code
The R script below will generate it (using my darklyplot packageās theme_dark2 function).
Iāve been thinking about distributional forecasts. In particular Iāve been considering Quantile Autoregressions (QAR) as defined in KOENKER AND XIAO 2006. There are some handy lecture notes Iāll borrow from at this link (pdf) in the exercise here.
This is all speculative, but I think this might be a useful way to think about the assymetry in likely outcomes given the uncertainty inherent in todayās economic forecasts.
Setup
Letās define the QAR(1) model for quantile \(Q(\tau)\),
Been a while since I blogged here. Where does time go? On Twitter, I realized itās just about time for spooky plots:
U.S. existing home sales hit a seasonally adjusted annual rate of 6 million in August 2020, first time at 6 million since 2006 pic.twitter.com/4ZKrO2d0zN
— š Len Kiefer š (@lenkiefer) September 22, 2020 Maybe too early?
In this post, I want to share a simple R code pattern thatās been useful for me.
Yesterday I announced that I wrote a simple R package darklyplot. This is a vignette I have built to help explain ways you can use the package.
The goal of darklyplot is to create simple time series plots with a dark background. The miniminum and maximum values are highlighted, and color coded along with the y axis and x axis labels. This vignette walks through basic usage and explores some of the package options.
Today I try my hand at building an R package called darklyplot. This package is a little extension of ggplot2 to create a dark themed time series plot.
This packages lets you create simple dark-theme times series plots. It extends ggplot2 and relies on mdthemes to make color coded axis labels. The axis labels use ggthemes::geom_rangeframe to create Tufte-like axes. Through parameters, the user can alter the colors, include shading under the line, and also add a single reference line.
Over on Twitter Grant McDermott shares a neat ggplot2 trick:
A shortcut I like to use is calling multiple geoms in an lapply() call, since this automatically generates a list. Works well for investigating plotting variations, e.g.
ggplot(diamonds, aes(carat)) +
lapply(c(50,200), function(b) geom_histogram(bins=b, alpha=0.3)) https://t.co/hf0vtvDkbk pic.twitter.com/jmmqlyJEKo
— Grant McDermott (@grant_mcdermott) June 22, 2020 I applied this trick to create a gradient fill for a chart.
Looks kind of like Kool-Aid.
Coronavirus Recession
Over on LinkedIn I posted a summary of recent economic talks I have been giving: The Coronavirus Recession. Read the whole things for analysis and lots of charts, but I leave off with three key questions:
Recession was here, but is it already gone?
Housing market indicators have rebounded, but will the recovery be sustained?
After effects of shutdown and possible second wave to the pandemic remain as risks to the outlook, how big are these risks?
Yesterday I completed the elusive presentation quadfecta. I did a talk on Zoom, Teams, WebEx and Skype. These communication apps are great, but after a few hours of maintaining āresting Zoom faceā (you want to look interested as the camera is always rolling), I felt a bit exhausted.
But it was totally worth it. The highlight for me was being able to join Jeffrey Shaffer, Steve Wexler, Amanda Makulec, and Andy Cotgreave for Chart Chat.
A couple years ago I posted R code for a remix of a remix of a US state unemployment rate chart.
Post Working on a workout. Some of the images were lost in a blog transition. Weāll update below.
Hereās an updated version:
And another remix focusing just on April 2020 (latest data).
R code
######################
## Load Libraries ##
######################
library(data.table)
library(quantmod)
library(tidyverse)
library(geofacet)
# Download data big file
ur.
Earlier this week, I made a boss chart:
https://t.co/6wf40jtqHI pic.twitter.com/xlv3Uzpiv0
— š Len Kiefer š (@lenkiefer) May 12, 2020 While listening to Chart Chat I heard Jeffrey Shaffer, Steve Wexler, Amanda Makulec, and Andy Cotgreave discuss tornado charts. I decided it might be a good idea to make one. Because Iām not sure I can trust with the awesome power inherent in these charts I wonāt post R code here.
This week the Freddie Mac Primary Mortgage Market Survey reported a record low for the U.S. weekly average 30-year mortgage rate.
First some charts, then below I post R code.
R code
Load libraries
library(fredr)
library(tidyverse)
library(patchwork)
library(cowplot)
library(gganimate)
library(lubridate) # updated
Youāll need a custom color scale (see below for code).
R code to wrangle data
data preparation code
source(paste0(mydir,"len_color_scales.R")) #custom color scale code copied below
fredr_set_key("YOURKEY")
df <- fredr(series_id = "MORTGAGE30US",
observation_start = as.
Today the U.S. Bureau of Labor Statistics released its monthly employment situation summary for March 2020. While many were expecting the U.S. labor market to show some weakness as the U.S. economy shuts down to battle COVID-19, the magnitude of the contraction surprised many. Because the reference week for the employment report was March 8th through March 14th, before the nationwide shutdown took full effect, many were expecting a relatively mild report.
Earlier today I tweeted out a chart of the U.S. Labor Departmentās estimate of initial jobless claims Link to pdf report.
weekly jobless claims, a 30Ļ event pic.twitter.com/LEO7s5TXsH
— š Len Kiefer š (@lenkiefer) March 26, 2020 Below I share R code to generate a chart like the one above.
We can get data from the St.Ā Louis Fedās Federal Reserve Economic Data (FRED).
Then itās easy to make an animation.
On Friday a colleague showed me an interesting chart, a map of maps. I believe the original was made in Tableau, but I decided to spin one up in R. I tweeted out the picture:
A map of maps, showing the correlation between state house price growth rates
You see pretty strong spatial correlation, with some interesting exceptions. Florida correlated with AZ, NV pic.twitter.com/9hzwZLkb41
— š Len Kiefer š (@lenkiefer) March 6, 2020 In this post I will supply the R code to make one.
I have been thinking about how the recent volatility could impact the economy. If travel and tourism contract due to fears of a pandemic, the impact will differ in markets around the United States. One way to think about this is to compute the Location Quotient, or the percentage of the employment in an area that is in the leisure and hospitality industry.
Conside the graphic below:
This map shows areas (states and core based statistical areas) color-coded by their location quotient.
Iāve been thinking about how different macroeconomic shocks might affect the U.S. housing market. Given recent volatility it is hard to know how to size risks. But it could be a useful exercise to think through how certain typical shocks might impact the housing market.
Rather than take on a full structural approach, I just want to extend the reduced form VAR analysis we did in a post from last year.
As an economist and all-around friend of strictly positive numbers I often use the log function. The natural logarithm of course, need I specify it? Apparently in certain spreadsheet software you do.
In this note I just wanted to write down a couple of observations about how to generate mean or median forecasts of a variable \(y\) given the model is fit in \(log(y)\). Of course, I am going to borrow heavily from Rob Hyndmanās blog, where he coverse this.
Economist Play-in Round
Bracket madness is about the descend on us. Before we get to March Madness weāll have to suffer through a different kind of madness: the Neoliberal Shill Bracket. This year the Neoliberal project has succumbed to inflation and has expanded the field. This year features a play-in round.
In this post we analyze the Economist Play-in:
Economist Play-in (8)
---@mioana @imbernomics @stanveuger @jodiecongirl @cblatts @jonathaneyer @R_Thaler @florianederer pic.
Recently I have been running R from my Android phone. There are some apps on the Google Play Store that seem to let you emulate R, or connect to a remote version. Instead of doing that, I have been running R directly off my phone using the terminal.
Rocking now
Writing, running #rstats scripts from the terminal with Emacs, pulling data from Fred, making chartz,
All from my oh so very smart phone https://t.
In a blog post the dual y-axis chart just say no Tim Duy asks analysts to give up dual y-axis charts for a new yearās resolution. Like with many resolutions, I predict most will fail at this challenge. I also predict few will take it up. Dual y-axis charts are super popular, especially in finance/economics.
As you all know, I care a lot about data visualization. And I have been fighting a losing battle against dual y-axis charts for about a decade.
I shared a chart recently on Twitter that got some attention:
static version pic.twitter.com/vtD54nXGio
— š Len Kiefer š (@lenkiefer) November 14, 2019 But not just any attention (though I do appreciate all your likes and retweets). This was special. Robert Allison [at]RobertAllison__ at SAS replicated the chart with SAS software and wrote a blog about it. These mortgage rates look shady to me. I worked on a lot of SAS stuff early in my days working at Freddie Mac, and Robertās SAS graph examples were a resource I often used.
Mortgage interest rates have moved about a percentage point lower from where they were a year ago. The housing market seems to have responded favorably.
On my way into D.C. the other day to do some business, I joined a Twitter exchange originally between [at]Graykimbrough and Adam Ozimek, [at]ModeledBehavior about the effects of Federal Reserve interest policy on the housing market.
Seems unlikely housing market was slowed by trade war.
This post is for me and future me, though if you get something out of that, thatās great too. Here I will jot down some notes on something Iāve been thinking about.
Because reasons, I have been interested in Vector Error Correction Models (VECM). Iāve been thinking of the case where you estimate an error correction model, and have available external forecasts for one of the variables. How can you easily construct the conditional forecasts for the VECM in R?
Today I tweeted something that turned out to be pretty popular:
US consumer prices pic.twitter.com/LxIxvEnGFe
— š Len Kiefer š (@lenkiefer) October 10, 2019 This was an update to a visualization we have talked about here. In this post I want to provide some updated R code to make this visualization taking advantage of the new gganimate api. Iāll also share some code for mortgage rate plots.
The code for the inflation plot is pretty simple.
Iām hearing that folks have been invited to speak at the upcoming Rstudio conference. Congratulations to the folks who got accepted this year.
I am not sure if Iām going to go to the conference this year, but I recommend you consider it if you love R. I spoke there last year, giving an E-Poster. It was a lot of fun. The best part was getting a chance to meet other R enthusiasts.
Iāve been thinking about smoothing time series data. I tweeted out a bite size bit of code.
To fit it into a tweet, I had to squeeze things down a bit.
Slightly more verbose, and using fredr to get data from FRED using their API. Youāll need an API key from FRED. These data happen to be for New Private Housing Units Authorized by Building Permits - in Structures with 1 Unit.
Early last Friday morning I was sitting in Palm Springs International Airport waiting to catch a flight back to Virginia. I had traveled out west to speak at the 2019 NAGLREP Conference. This Friday happened to be jobs Friday, when the U.S. Bureau of Labor Statistics releases the employment situation. Jobs Fridays are busy on Twitter. Everybody seems eager to offer a perspective on what the latest jobs numbers mean for the U.
Earlier today I tweeted out some yield curve charts. I wonāt go into great detail into the why, but here I will share some R code to make the charts.
My Thread:
hope you all are ready for some crazy yield curve charts, cuz you're about to get some crazy yield curve charts
— š Len Kiefer š (@lenkiefer) August 7, 2019 We can get Treasury yield curve data from the U.
I have been exploring some visualizations for housing seasonality. In recent days Iāve tried out various ways of using tile plots to display seasonal patterns in home sales and other related data. In this post I want to share some of the R code I used to wrangle data and generate those plots.
You can see some for example, in this thread (and others):
though outside there's blistering heat
Let us take a look at house price trends in the United States and across states and metro areas.
Earlier this week I tweeted out a few charts on housing market trends.
In most of the middle part of the country over the past 44 years there has been little growth in real (inflation-adjusted) house prices.
In coastal states, a very different story. pic.twitter.com/PLbiNftha3
— š Len Kiefer š (@lenkiefer) July 10, 2019 In this post weāll analyze real house prices since 1975, and per usual use R to wrangle data and make plots.
If I cannot send Adam Ozimek (at Modeled Behavior ) a Diet Pepsi, then the next best thing might be a chart about epop. epop is the term economists use to describe the employment-to-population ratio, a useful summary statistic about the labor market. Perhaps the summary statistic. Adam (and others) has been talking about epop as a key labor market statistic for years. It seems the Federal Reserve is catching on to the usage of the term epop (though many economists over there have been looking at the statistic for a long while too).
Last week I posted a long thread comparing trends in various housing market indicators over on Twitter:
Assuming we aren't in recession right now, the current expansion will tie the 1990s expansion for longest in U.S. history. Let's take a look at how housing markets have behaved in this expansion relative to earlier ones
a thread... pic.twitter.com/ovaiq3lsA8
— š Len Kiefer š (@lenkiefer) June 12, 2019 I followed it up with an article on LinkedIn with some more commentary The U.
Recently the U.S. Census Bureau released updated population estimates through 2018 for the United States, states, counties, and metropolitan statistical areas (MSA). Press release
I tweeted out the following chart comparing house prices and state population dynamics.
demographics are an important driver of #housing market trends. here's a comparison of growth in state population and nominal house prices since the year 2000
left to right: more people
bottom to top: higher home prices pic.
The current economic expansion is set to enter its tenth year this summer. Assuming we make it to June, this will become the longest U.S. economic expansion in recorded history stretching back to the 19th century. But how is the housing market doing? After a decade of recovery housing market activity still has room for improvement, but trends in 2018 were negative. Home sales, housing construction and house price growth all declined in 2018.
Earlier this month I attended the National Association for Business Economics (NABE) annual policy conference in Washington D.C. LINK. One of the keynote speeches was by Alan Greenspan. During his remarks, Greenspan mentioned that while economic forecasting was hard demographic projections were the surest thing in an uncertain business.
Demographics of course are not easy, but itās much easier to guess what the population of 30 years olds will be in 5 years than it is the predict the unemployment rate or GDP in 5 years.
The U.S. housing market stalled out a bit in 2018 and we arenāt building enough homes to match demand. See my recent speech for details on whatās going on. Abbreviated version: in 2018 mortgage interest rates slowed housing activity, but demographic forces support housing demand and should provide a lift in years to come. Together with a recent moderation in mortgage rates thereās reason to be optimistic about housing market activity in 2019.
Hereās some R code for an animated chart of the U.S. prime working age (25-54) labor force participation rate. I tweeted it out last Friday:
Labor force participation rate #dataviz made with #rstats #gganimate pic.twitter.com/uSICoLjbIf
— š Len Kiefer š (@lenkiefer) February 1, 2019 We can go to the U.S. Bureau of Labor Statistics (BLS) webpage (https://www.bls.gov/) and get these data. For more details see my post Charting Jobs Friday with R.
Because reasons Iāve been interested in picking up some Python. But I like the Rstudio IDE, so it sure would be nice if I could just run Python from R. Fortunately, thatās possible using the reticulate package. Letās give it a try.
Our strategy will be to use R to do the data wrangling and then pass the data to Python to make a plot. Is this a good idea?
I really like R, but I love the R community. Since Iāve started using R intensively in the past couple of years, Iāve constantly been awed and inspired by all the amazing things that people are doing with R. The spirit of the open source community and peopleās willingness to share their thoughts and code is fantastic.
Many times in this space weāve remixed different data visualizations with R, often relying on awesome new packages that others have developed.
At the start of the year, the R package gganimate hit CRAN. See this announcement blog post with some examples.
In this space, Iāve shared several posts on animation see tags. But I havenāt been using gganimate. Instead, I took a more direct approach building the animations via loops and trying to tween directly if I wanted a smooth animation. This level of control is nice, but frankly the defaults in gganimate work better than many of my attempts to hand craft it.
Itās the time of the year where everybody is dusting off their crystal balls and peering into the future. Thereās even still time to send out your āWinter is Comingā newsletter.
Letās take a step back and look at how forecasts of U.S. macro variables have evolved. Is forecasting still hard?
Last year we looked at historical forecasts of economic conditions in the post forecasting is hard. Letās update it.
My recent economic and housing market talks see for example here have been titled: āWill the U.S. housing market get back on track in 2019?ā. My general conclusion has been cautiously optimistic. There is enough strength in the broader economy and enough of a tailwind from demographic forces to push the U.S. housing market to modest growth next year.
I still think thatās true, but as I have said in my talks, risks are weighted to the downside.
U.S. housing markets have slowed down in 2018. Housing construction, which is still running well below both historical averages and what the U.S. currently needs to meet rising demand has stalled out this year.
The current level of housing construction is close to the level weāve seen in recession periods. And the historical comparison stretching back decades is comparing a nation with significantly fewer households. Total U.S. households for example, in 1970 were about 1/2 (63 million) of what they were in 2017 (126 million) FRED chart.
COME AND MAKE A MAJESTIC MORTGAGE RATE PLOT WITH ME.
Weāll use R to plot a few visualizations of mortgage rates.
I recently gave a number of talks about the economic outlook and housing. One point I like to make is that mortgage rates are low. Iāve shown this through a variety of visualizations. But one of my favorites looks like this:
Letās make it.
Data
Weāll plot mortgage rates using the Freddie Mac Primary Mortgage Market Survey.
HEY! HERE IS A VIDEO SHOWING HOUSE PRICE TRENDS around the United States.
Earlier this year we looked at how to get the data and plot it using R.
I made the video using the PowerPoint to .mp4 workflow I outlined here.
Below Iāll review how to build this file.
Get data
We are going to use house price data from the publicly available Freddie Mac House Price Index.
WE ARE LATE FOR HALLOWEEN, but letās get out our broom and purrr as we tidy some statistical results.
Today I had occasion to be reminded of competing risks and a handy statistical result on competing risks from A.P. Basu and J.K. Ghosh published in the Journal of Multivariate Analysis in 1978. The paper Identifiability of the multinormal and other distributions under competing risks model showed an analytical result on the distribution of a variable Z which is the minimum of two Gaussian (Normal) random variables.
BACK WE GO INTO THE VASTY DEEP. LAST TIME we introduced the idea of using dynamic model averaging to forecast recessions. I was so excited about the new approach that I didnāt take the time to break down what was going on with it. In this post weāll look more closely at whatās happening with the dma packaged when we try to forecast recessions.
Per usual weāll do it with R and Iāll include code so you can follow along.
HERE THE LITERATURE IS VASTY DEEP. In this post weāll dip our toes, every so slightly, into the dark waters of macroeconometric forecasting. Iāve been studying some techniques and want to try them out. Iām still at the learning and exploring stage, but letās do it together.
In this post weāll conduct an exercise in forecasting U.S. recessions using several approaches. Per usual weāll do it with R and Iāll include code so you can follow along.
IN THIS POST I WANT SHARE A METHOD FOR MAKING SMOOTH POWERPOINT ANIMATIONS USING R.
Also see other posts in this series:
Crafting a PowerPoint Presentation with R LINK
PURRRTY PowerPoint with R LINK
PURRRtier PowerPoint with R LINK
Motivation
Why would you want to do this? Weāve covered how to make an animated gif with R and that works pretty well. But there are a couple advantages with this approach.
WE ARE ON OUR WAY TOWARDS BUILDING a tidy PowerPoint workflow. In this post I want to build on my earlier posts (see here for an introduction and here for a more sophisticated approach) for building a PowerPoint presentation with R and try to make it even purrrtier.
I saw that somebody shared my posts on reddit and I thought I would take a look at the comments. Folks on the internet are known for kindness and offering helpful advice right?
MORTGAGE RATES ARE LOW IN THE UNITED STATES. How low? Letās take a look.
Weāll use R to plot a few visualizations of mortgage rates. Weāll also try out some of the nice features in the tibbletime package that help when working with time series data. For more on using tibbletime see this post and this one on making animated plots.
Since I was already called out for alleged chartcrimes, Iām going to go ahead and let loose here.
IN THIS POST I WANT TO SHARE SOME CODE TO CREATE AN ANIMATED CHART of housing starts.
Per usual weāll use R and weāll also use the tibbletime package.
Running the code below will generate:
(see this post for more on animations with the R package tweenr)
#####################################################################################
## Load libraries ##
#####################################################################################
library(tidyquant)
library(tibbletime)
library(tweenr)
#####################################################################################
## Get housing starts data ##
#####################################################################################
df <- tq_get("HOUST1F",get="economic.data",from="1959-01-01")
#####################################################################################
## function for rolling windows ##
#####################################################################################
mys <- function(win=12){
rolling_mean <- rollify(mean, window = win) #function creates rolling average based on win
df %>%mutate(dy=rolling_mean(price),
w=as.
IN THIS POST I WANT TO REVISIT FLEXDASHBOARDS. Back in January we made several Flexdashboards with R to display economic data. See my guide to building a flexdashboard for some examples.
In this post, I want to use the tidyquant package to wrap some of the plots we made earlier into a flexdashboard.
Iāll have more to say about this in the near future, but I just wanted to make a simple flexdashboard (partially to remind myself how to do it).
WHAT TIME IS IT? TIME FOR TIBBLETIME! In this post, Iām going to take the tibbletime package out for a spin. Turns out this package is quite useful for things I tend to do.
Weāll use the tibbletime package to write some R code to extend our ongoing analysis of trends in the U.S. mortgage market (see here for example).
Davis Vaughan (on Twitter) one of the authors of the tibbletime package suggested I take a look:
TIME FOR ANOTHER DATA WRANGLING AND VISUALIZATION EXTRAVAGANZA. This time we are going to work hard to turn some big data into little data. That is, weāre going to work hard to aggregate several million loan level records into useful summary graphics to tell us about the U.S. mortgage market in 2016.
Iāve been working on a lot of different ways to visualize trends in the mortgage market (see here and here for examples).
IN THIS POST I WANT TO GO THROUGH SOME EXAMPLES of using the purrr package for R.
Now there are already some great examples of how to use purrr. If youāre brand new to purrr (like I was not long ago) probably start with Jenny Bryanās Purrr tutorial then see R for Data Science and also this presentation from rstudioconf (pdf). You can also check out this curated collection via Mara Averick (on Twitter: dataandme ) for a bunch of examples.
IN THIS POST I WANT TO SHARE SOME R CODE to create charts of U.S. housing starts we studied last week.
Get data
Weāll use tidyquant (see e.g. this post for more) to go get our data from the St.Ā Louis Federal Reserve Economic Database (FRED). Weāll also use cowplot to arrange multiple ggplot2 graphs on one page.
Letās load libraries and grab the data.
#####################################################################################
## Step 0: Load Libraries ##
#####################################################################################
library(tidyquant)
library(tidyverse)
library(cowplot)
library(lubridate)
library(scales)
library(ggridges) # replaces ggjoy
#####################################################################################
## Step 1: Prepare for data ##
#####################################################################################
tickers=data.
LOOK I DONāT HAVE ANYTHING BAD TO SAY about PowerPoint. Others have said it (see for example Tufte and Harvard Business Review). Itās a tool and a fact of life for many of us.
I am interested in making better PowerPoints. In this post weāll use some R tools to generate a PowerPoint deck.
OfficeR
The package officer allows you to access and manipulate āMicrosoft Wordā and āMicrosoft PowerPointā documents from R.
LETāS WRANGLE SOME HOUSING DATA.
Weāll try something different with how posts are organized. In the past I have generally mixed data wrangling, R code and graphs all in one post. Now Iām going to break it up. Posts like yesterday will just show some data and discuss it. Then, if the data wrangling or code is complicated enough Iāll follow up with another post with details.
Youāll be able to find all my posts on data wrangling, under the data wrangling tag and R code under the R tag.
R statistics dataviz housing mortgage data
R statistics dataviz housing mortgage data
R statistics dataviz housing mortgage data
R statistics dataviz housing mortgage data
rstats data visualizations of housing data
rstats data visualizations of housing data
rstats data visualizations of housing data
R statistics dataviz ggplot2 housing mortgage data
R statistics dataviz ggplot2 housing mortgage data
R statistics dataviz ggplot2 housing mortgage data
R statistics dataviz ggplot2 housing mortgage data
R statistics dataviz plotly housing mortgage data
R statistics dataviz housing mortgage data
R statistics rstats mortgage rates dataviz
R statistics rstats mortgage rates dataviz
R statistics rstats mortgage rates dataviz
R statistics rstats mortgage rates dataviz
R statistics dataviz housing mortgage data
R statistics dataviz housing mortgage data
R statistics dataviz housing mortgage data
R statistics dataviz housing mortgage data
R statistics dataviz housing mortgage data
R statistics dataviz housing mortgage data
R statistics dataviz housing mortgage data
YOU HAVE SPOKEN and we will go with ticks out, at least 54% of the time.
In a graph, should axis ticks face in or out?
— Leonard Kiefer (@lenkiefer) February 5, 2017
To celebrate, letās make an animated gif where the axis expands over time. Weāll use data we used in our mortgage rate post.
Letās start by loading the data and plotting a static graph (with ticks out of course).
R statistics dataviz plotly housing mortgage data
R statistics dataviz housing mortgage data
R statistics housing animation ggplot2
Charts and graphs exploring house price trends
Introduction HOUSE PRICES HAVE NOW RECOVERED BACK TO THEIR PRE-RECESSION PEAK, at least according to some indices. The Freddie Mac House Price Index, for example, surpassed its pre-2008 peak in the latest release for data through September 2016. In this post I’ll be exploring trends in house prices and exploring different ways of showing how far house prices have come, and in some cases, how far they still have to go.
IN THIS POST I WANT TO CREATE some data visualizations with R using the recently released Home Mortgage Disclosure Act (HMDA) data.
For this post I’m going to return to the 2015 HMDA that you can get from the Consumer Financial Protection Bureau (CFPB) webpage and I discussed earlier.
Check out my prior post for more discussion of how we build these data visualizations.
R code for graphs posted below
IT WAS A BUSY WEEK FOR ECONOMIC AND HOUSING DATA this week. Below are some data visaulizations I made tracking key trends in economic and housing market data.
Homeowner equity increases to $12.7 Trillion in the second quarter of 2016 With house prices rising by nearly 6 percent on a year-over-year basis, homeowners are building back equity. According to the Federal Reserve’s Flow of Funds, owners' equity in real estate was $12.
IN MY PREVIOUS POST we looked at the Job Openings and Labor Turnover Survey (JOLTS) data and plotted a Beveridge Curve. In this post I want to add some more code that allows us to plot Beveridge Curves by industry.
For more on the analysis of industry-specific Beveridge Curves, see this paper published in the June 2012 Monthly Labor Review that decomposes shifts in the Beveridge Curve and looks at it by industry.
IN THIS POST WE’LL LOOK AT recent job openings and hires data from the Bureau of Labor Statistics Job Openings and Labor Turnover Survey (JOLTS).
R code for selected graphs posted below
Job openings and labor turnover Total nonfarm trends Let’s start by looking at aggregate national trends for total nonfarm sector. The plot below compares hires, job openings and separations (the sum of quits, layoffs and discharges, and other separations) over time.
EARLIER THIS WEEK THE U.S. BUREAU OF LABOR STATISTICS released data on consumer expenditures in 2015. In this post I want to examine these data and make a few visualizations. R code for graphs posted below
One area I pay close attention to is housing. Housing is the largest single category of expenditure, averaging about 1/3 of total consumer expenditures. The BLS breaks the data out by tenure, so we can see how expenditures vary by owners versus renters.
IN THIS POST WE’LL REVIEW some recent economic and housing market trends. R code for graphs posted below
Low mortgage rates Mortgage rates remain low, with the 30-year fixed mortgage averaging 3.43 percent for the week of 8/25. That’s nine consecutive weeks with rates under 3.5 percent. Ever since Brexit.
One way I like to look at rates is to compare the weekly rates by week of year (e.g. first week of 2016 compared to first week of 2015).
WE ARE GOING TO EXAMINE THE DISTRIBUTION OF US POPULATION and make an animated gif combining a map and a kernel density estimate of the distribution of county population densities. Density of densities, or density squared.
We are going to use the same US County Population Estimates 1790-2010 we used in my previous post.
We’ll end up with this:
How do we do it?
Code First, we’ll load the data and do some manipulations.
SOMETIMES YOU ACTUALLY LEARN SOMETHING from social media. Today on Twitter I happened across this Tweet via @kyle_e_walker:
Seems somebody posted estimates of the U.S. population by county (defined by 2010 county definitions) going back to 1790. This is a perfect dataset to practice my mapping with R.
The data are conveniently available via the University of Minnesota. The data come in a nice spreadsheet that we can easily import into R and manipulate.
IN THIS POST I WANT TO DOCUMENT some R code I’ve recently been working on combining maps and distribution plots. As I discussed earlier lots of interesting data will be released in the fall and I want to be ready for it.
Some of these snippets can be recycled when the new data is available.
Maps One area of data visualization with R I haven’t explored much is mapping. Part of this reason is because I’ve had other tools to use, but usually it’s because I’m in a hurry.
AUGUST IS ALMOST OVER, and it’s nearly back to school season. And that means one thing. No, not that we’re about to get a chance to watch the #1 NCAA football program of all time dominate the gridiron (though that’s awesome too). No, it’s data release season! A data swarm is on its way.
From American Community Survey to the American Housing Survey to the annual Home Mortgage Disclosure Act Data many statistical data releases come out in September and October.
OVER THE PAST THREE MONTHS I HAVE MADE several new house price visualizations. In these meditations I’ll consider some recent graphs and provide R code for them. For reference, prior meditations are available at:
Part 1: data wrangling Part 2: sparklines and dots (animated) Part 3: bubbles and bounce Part 4: graph gallery Meditation 1: Median sales price trends Earlier this week, the National Association of Realtors (NAR) released their quarterly update on metro area median house prices (data here).
TODAY the NEW YORK FEDERAL RESERVE BANK released its Quarterly Report on Household Debt and Credit. These data come from the Center for Microeconomic Data based on credit records from Equifax.
R code for the graphs are posted at bottom of page
Trends in household debt balances One of the key statistics tracked in the report (full data can be found here) is household debt balances. They break debt balances out by loan type:
IN THIS POST WE’RE GOING to make an animated version of the famous Federal Reserve dot plot comparing the dots in March 2016 to June 2016. As far as celebrity charts go, the dot plot has to be up there. Check out how much “dot plot” has grown in search popularity since the Fed introduced their dot plot in 2012:
Ultimately, it will look something like this:
The dot plot is a special chart that shows the distribution of expectations of the Federal Open Market Committee (FOMC) for the federal funds rate.
A dataviz remix I came across some interesting analysis from Josh Lehner at the Oregon Office of Economic Analysis about the “Housing Trilemma”.
The Housing Trilemma, captured in the chart below is based on the following claim:
Every city wants to have a strong local economy, high quality of life and housing affordability for its residents. Unfortunately these three dimensions represent the Housing Trilemma. A city can achieve success on two but not all three at the same time.
IN THIS POST I WANT TO PROVIDE some additional examples of using tweenr and gganimate to create nice smooth transitions in an animated GIF. In this post we’ll look at an animated waterfall chart.
For this data I’m going to use the National Income and Products Accounts from the U.S. Bureau of Economic Analysis (BEA). Specifically we’ll be looking at the contributions to growth in Real Gross Domstic Product, which you can find here.
RECENTLY I HAVE MADE several animated GIFs, primarily using the animation package. These gifs usually work pretty well, coming out something like this (code here).
Unfortunately, these gifs tend to come out rather choppy. I tried to get around that by using variable length sequences to alter the timing of each frame. My primitive approach involves loops and business like this:
{% highlight r for (i in c(seq(1,365,6),seq(378,716,13),seq(1145,1574,26),seq(1587,2003,13),seq(2016,2185,6),2196) ) { {% endhighlight
EARLIER THIS WEEK THE U.S. CENSUS BUREAU released dataon population and housing units for counties across the U.S. in 2015. These data reveal important trends in population growth, and help shed light on recent house price trends.
Housing unit growth One key factor driving housing market dynamics is the expansion of housing supply (or lack thereof). The updated estimates from Census allow us to see which areas have added the most housing units and how that relates to population and house price trends.
Another mortgage rates animated gif IN THE PAST I’ve told you how I made my mortgage rates gif. In this post I’m make an extension that uses stop motion techniques to reverse course. We’ll end up with this:
For reference, here’s the standard gif I share each Thursday after mortgage rates come out:
Stop motion animation While thinking about the week-to-week movements in rates it’s easy to lose longer-term perspective. Rates in the week of May 12, 2016 were the lowest in three years.
We’re going to make this chart (and talk about it)
Wait, what is this?
Let’s pause the animation and look at the last frame:
This plot shows the distribution of metro area unemployment. These data are available here.
Each dot represents a metro area with its unemployment rate depicted on the x axis. The data are bucketed into 0.25 percentage point buckets and stacked when more than one metro falls within that range.