Textmining is an exciting topic. There is tremendous potential to gain insights from textual analysis. See for example Gentzko, Kelly and Taddy’s Text as Data. While text mining may be quite advanced in other fields, in finance and economics the application of these techniques is still in its infancy.
In order to take advantage of text as data, economists and financial analysts need tools to help them. Fortunately, there is a great resource: Text Mining with R by Julia Silge (blog and on Twitter atjuliasilge) and David Robinson (blog and on Twitter atdrob).
Indications are that U.S. housing market activity in the middle part of 2018 has moderated. Home sales estimates for both new home sales and existing home sales declined on a seasonally adjusted basis in June relative to May. House price growth has also moderated recently. Some folks have gotten animated about the recent trends.
I’m more sanguine about the recent data. Certainly a slowdown in housing market activity would be cause for concern.
I saw today, via Ropensci a blog post about a new package for making animated gifs with R called gifski now available on CRAN.
Let’s adapt the code we shared last week to use the gifski package. See that post for additional details.
If we run the R code below we’ll generate this animated plot:
This plot shows the evolution of house prices in two states, California (CA) and Texas (TX) versus the United States (USA).
On Twitter Claus Wilke asks:
Dear Lazyweb: Is there an accepted name for a plot showing a two-variable time series as a path in the x-y plane? #dataviz@Elijah_Meeks @albertocairo @lenkiefer @sharoz @dataandme pic.twitter.com/N8Edmf8qii
— Claus Wilke (@ClausWilke) July 21, 2018 I call them connected scatterplots, and we’ve made a few here. See for example this post.
But we can intensify things and make a plot like this:
hey @ClausWilke why stop at a 2-d connected scatterplot* when you could go to 3-d
The Linear Probability Model (LPM) might be bad, but is it all bad? Let’s look at some conditions where the LPM might not be so bad. We’ll also look at some simple adjustments that might improve the performance of the LPM. We’ll also compare the LPM to some common alternatives.
Setup
Throughout most of this post, we’re going to consider a world where the LPM model is the true model.
I try not to use too much jargon (jargon monoxide can be deadly) on this blog. But I’ve got a bit of a technical term I’ve been using the describe U.S. residential construction: super-low.
To be sure, housing construction has been grinding higher, but it’s been taking a while for activity to get back close to historical averages. Once you account for the larger population, which all else equal needs more housing units, the level of construction is quite low.
I think a lot about predicting/forecasting binary outcomes. Will the economy head into a recession next year? What’s the likelihood of a loan defaulting over the next few years? Will my followers on social media abandon me if I tweet about my lunch?
One often maligned, but seemingly irresitable approach to modeling binary ourcomes is the Linear Probability Model (LPM). As is known going back to before I was born, the Linear Probability Model has some issues.
I am headed out west, to California to talk housing at the Western Secondary Market Conference. After my talk they might post my slides online somewhere. If they do I’ll link to them, but for now you can get a preview in this twitter thread.
Like many western states, California is facing a imbalance between housing supply and housing demand. Strong economic growth has bolstered demand, but supply has not kept up.
I decided to switch over my blog theme. The Ghostwriter theme I used was nice, but it didn’t have a blog archive. As the number of posts grow a blog archive is easier to search. We still have tags you can search.
I’ve adopted the Hugo Blackburn theme. This is the same theme used over at the Simply Statistics blog. If you drop by that blog, check out this essay by Roger Peng with some perspective on the evolution of R.
Let’s compare two charts. “Your chart”, or a chart that might come virtually unedited from spreadsheet software versus the chart your boss told you not to worry about:
Your chart is perfectly serviceable and for a quick exploration might be perfectly fine. However, why routinely generate such charts if you have the ability to make something a bit more dynamic? Being able to produce more interesting charts might not be necessary, but it also probably doesn’t hurt.