Let’s pick up where we left off yesterday and do some more exploration with text mining. Like yesterday we’ll use the tidytext package for R. And we’ll lean heavily on Julie Silge and David Robinson’s Text Mining with R. Data We’ll turn again to the Federal Reserve for our text data. But today we’ll explore the Beige Book, which gathers anecdotal information on current economic conditions across the Federal Reserve Districts.
Textmining is an exciting topic. There is tremendous potential to gain insights from textual analysis. See for example Gentzko, Kelly and Taddy’s Text as Data. While text mining may be quite advanced in other fields, in finance and economics the application of these techniques is still in its infancy. In order to take advantage of text as data, economists and financial analysts need tools to help them. Fortunately, there is a great resource: Text Mining with R by Julia Silge (blog and on Twitter atjuliasilge) and David Robinson (blog and on Twitter atdrob).
Indications are that U.S. housing market activity in the middle part of 2018 has moderated. Home sales estimates for both new home sales and existing home sales declined on a seasonally adjusted basis in June relative to May. House price growth has also moderated recently. Some folks have gotten animated about the recent trends. I’m more sanguine about the recent data. Certainly a slowdown in housing market activity would be cause for concern.
I saw today, via Ropensci a blog post about a new package for making animated gifs with R called gifski now available on CRAN. Let’s adapt the code we shared last week to use the gifski package. See that post for additional details. If we run the R code below we’ll generate this animated plot: This plot shows the evolution of house prices in two states, California (CA) and Texas (TX) versus the United States (USA).
On Twitter Claus Wilke asks: Dear Lazyweb: Is there an accepted name for a plot showing a two-variable time series as a path in the x-y plane? #dataviz@Elijah_Meeks @albertocairo @lenkiefer @sharoz @dataandme pic.twitter.com/N8Edmf8qii — Claus Wilke (@ClausWilke) July 21, 2018 I call them connected scatterplots, and we’ve made a few here. See for example this post. But we can intensify things and make a plot like this: hey @ClausWilke why stop at a 2-d connected scatterplot* when you could go to 3-d
The Linear Probability Model (LPM) might be bad, but is it all bad? Let’s look at some conditions where the LPM might not be so bad. We’ll also look at some simple adjustments that might improve the performance of the LPM. We’ll also compare the LPM to some common alternatives. Setup Throughout most of this post, we’re going to consider a world where the LPM model is the true model. That is:
I try not to use too much jargon (jargon monoxide can be deadly) on this blog. But I’ve got a bit of a technical term I’ve been using the describe U.S. residential construction: super-low. To be sure, housing construction has been grinding higher, but it’s been taking a while for activity to get back close to historical averages. Once you account for the larger population, which all else equal needs more housing units, the level of construction is quite low.
I think a lot about predicting/forecasting binary outcomes. Will the economy head into a recession next year? What’s the likelihood of a loan defaulting over the next few years? Will my followers on social media abandon me if I tweet about my lunch? One often maligned, but seemingly irresitable approach to modeling binary ourcomes is the Linear Probability Model (LPM). As is known going back to before I was born, the Linear Probability Model has some issues.
I am headed out west, to California to talk housing at the Western Secondary Market Conference. After my talk they might post my slides online somewhere. If they do I’ll link to them, but for now you can get a preview in this twitter thread. Like many western states, California is facing a imbalance between housing supply and housing demand. Strong economic growth has bolstered demand, but supply has not kept up.
I decided to switch over my blog theme. The Ghostwriter theme I used was nice, but it didn’t have a blog archive. As the number of posts grow a blog archive is easier to search. We still have tags you can search. I’ve adopted the Hugo Blackburn theme. This is the same theme used over at the Simply Statistics blog. If you drop by that blog, check out this essay by Roger Peng with some perspective on the evolution of R.