BACK IN JANUARY WE LOOKED AT HOUSING microdata from the American Community Survey Public Microdata that we collected from IPUMS. Let’s pick back up and look at these data some more. Glad you could join us.
Be sure to check out my earlier post for more discussion of the underlying data. Here we’ll pick up where we left off and make some more graphs using R.
Just a quick reminder (read the earlier post for all the details), we have a dataset that includes household level observations for the 20 largest metro areas in the United States for 2010 and 2015 (latest data available).
Below we load the data and check out its structure:
Load data
library(data.table)
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## between(): dplyr, data.table
## filter(): dplyr, stats
## first(): dplyr, data.table
## lag(): dplyr, stats
## last(): dplyr, data.table
## transpose(): purrr, data.table
library(ggbeeswarm)
library(viridis)
## Loading required package: viridisLite
load("data/acs.RData")
str(dt.x)
## Classes 'data.table' and 'data.frame': 540325 obs. of 6 variables:
## $ year :Class 'labelled' atomic [1:540325] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## .. ..- attr(*, "label")= chr "census year"
## .. ..- attr(*, "format.stata")= chr "%8.0g"
## .. ..- attr(*, "labels")= Named num [1:30] 1850 1860 1870 1880 1900 1910 1920 1930 1940 1950 ...
## .. .. ..- attr(*, "names")= chr [1:30] "1850" "1860" "1870" "1880" ...
## $ cbsa.name: chr "Atlanta-Sandy Springs-Roswell, GA Metro Area" "Atlanta-Sandy Springs-Roswell, GA Metro Area" "Atlanta-Sandy Springs-Roswell, GA Metro Area" "Atlanta-Sandy Springs-Roswell, GA Metro Area" ...
## $ met2013 :Class 'labelled' atomic [1:540325] 12060 12060 12060 12060 12060 ...
## .. ..- attr(*, "label")= chr "metropolitan area, 2013 omb delineations"
## .. ..- attr(*, "format.stata")= chr "%12.0g"
## .. ..- attr(*, "labels")= Named num [1:295] 0 10420 10580 10740 10780 ...
## .. .. ..- attr(*, "names")= chr [1:295] "not in identifiable area" "akron, oh" "albany-schenectady-troy, ny" "albuquerque, nm" ...
## $ hhincome : atomic 17910 128000 110700 31000 70750 ...
## ..- attr(*, "label")= chr "total household income"
## ..- attr(*, "format.stata")= chr "%12.0g"
## $ valueh : atomic 85000 150000 230000 78000 120000 300000 100000 70000 135000 150000 ...
## ..- attr(*, "label")= chr "house value"
## ..- attr(*, "format.stata")= chr "%12.0g"
## $ hhwt : atomic 74 82 87 58 73 279 83 103 113 80 ...
## ..- attr(*, "label")= chr "household weight"
## ..- attr(*, "format.stata")= chr "%8.0g"
## - attr(*, "sorted")= chr "met2013"
## - attr(*, ".internal.selfref")=<externalptr>
I’ve cleaned the data up a bit to only include household level observations for owner households for the largest 20 metro areas. I’ve saved a data.table() called dt.x
using the data described in the earlier post.
In the prior post we filtered to only the top 12 metro areas. But I’ve prepared data that filters to the top 20. If you are following along from the earlier post, just replace dt.x<-dt2[cbsa.name %in% pop.list[order(-pop)]$cbsa.name[1:12] & pernum==1]
with dt.x<-dt2[cbsa.name %in% pop.list[order(-pop)]$cbsa.name[1:20] & pernum==1]
and everything else should follow.
As before, let’s randomly sample 2,000 observations from these 20 large metro areas using the household weights.
# First draw a random sample of 2,000 observations from each year/metro combination
dt.samp<-dt.x[,.SD[sample(.N,min(.N,2000),prob=hhwt)],by = c("year","cbsa.name") ]
#Get our list:
metro.list<-dt.samp[,list(obs=.N),by=c("year","met2013","cbsa.name")]
# check observations:
# obs is number of observations
htmlTable::htmlTable(metro.list,col.rgroup = c("none", "#F7F7F7"))
year | met2013 | cbsa.name | obs | |
---|---|---|---|---|
1 | 2010 | 12060 | Atlanta-Sandy Springs-Roswell, GA Metro Area | 2000 |
2 | 2015 | 12060 | Atlanta-Sandy Springs-Roswell, GA Metro Area | 2000 |
3 | 2010 | 14460 | Boston-Cambridge-Newton, MA-NH Metro Area | 2000 |
4 | 2015 | 14460 | Boston-Cambridge-Newton, MA-NH Metro Area | 2000 |
5 | 2010 | 16980 | Chicago-Naperville-Elgin, IL-IN-WI Metro Area | 2000 |
6 | 2015 | 16980 | Chicago-Naperville-Elgin, IL-IN-WI Metro Area | 2000 |
7 | 2010 | 19100 | Dallas-Fort Worth-Arlington, TX Metro Area | 2000 |
8 | 2015 | 19100 | Dallas-Fort Worth-Arlington, TX Metro Area | 2000 |
9 | 2010 | 19740 | Denver-Aurora-Lakewood, CO Metro Area | 2000 |
10 | 2015 | 19740 | Denver-Aurora-Lakewood, CO Metro Area | 2000 |
11 | 2010 | 19820 | Detroit-Warren-Dearborn, MI Metro Area | 2000 |
12 | 2015 | 19820 | Detroit-Warren-Dearborn, MI Metro Area | 2000 |
13 | 2010 | 26420 | Houston-The Woodlands-Sugar Land, TX Metro Area | 2000 |
14 | 2015 | 26420 | Houston-The Woodlands-Sugar Land, TX Metro Area | 2000 |
15 | 2010 | 31080 | Los Angeles-Long Beach-Anaheim, CA Metro Area | 2000 |
16 | 2015 | 31080 | Los Angeles-Long Beach-Anaheim, CA Metro Area | 2000 |
17 | 2010 | 33100 | Miami-Fort Lauderdale-West Palm Beach, FL Metro Area | 2000 |
18 | 2015 | 33100 | Miami-Fort Lauderdale-West Palm Beach, FL Metro Area | 2000 |
19 | 2010 | 33460 | Minneapolis-St. Paul-Bloomington, MN-WI Metro Area | 2000 |
20 | 2015 | 33460 | Minneapolis-St. Paul-Bloomington, MN-WI Metro Area | 2000 |
21 | 2010 | 35620 | New York-Newark-Jersey City, NY-NJ-PA Metro Area | 2000 |
22 | 2015 | 35620 | New York-Newark-Jersey City, NY-NJ-PA Metro Area | 2000 |
23 | 2010 | 37980 | Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area | 2000 |
24 | 2015 | 37980 | Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area | 2000 |
25 | 2010 | 38060 | Phoenix-Mesa-Scottsdale, AZ Metro Area | 2000 |
26 | 2015 | 38060 | Phoenix-Mesa-Scottsdale, AZ Metro Area | 2000 |
27 | 2010 | 40140 | Riverside-San Bernardino-Ontario, CA Metro Area | 2000 |
28 | 2015 | 40140 | Riverside-San Bernardino-Ontario, CA Metro Area | 2000 |
29 | 2010 | 41180 | St. Louis, MO-IL Metro Area | 2000 |
30 | 2015 | 41180 | St. Louis, MO-IL Metro Area | 2000 |
31 | 2010 | 41740 | San Diego-Carlsbad, CA Metro Area | 2000 |
32 | 2015 | 41740 | San Diego-Carlsbad, CA Metro Area | 2000 |
33 | 2010 | 41860 | San Francisco-Oakland-Hayward, CA Metro Area | 2000 |
34 | 2015 | 41860 | San Francisco-Oakland-Hayward, CA Metro Area | 2000 |
35 | 2010 | 42660 | Seattle-Tacoma-Bellevue, WA Metro Area | 2000 |
36 | 2015 | 42660 | Seattle-Tacoma-Bellevue, WA Metro Area | 2000 |
37 | 2010 | 45300 | Tampa-St. Petersburg-Clearwater, FL Metro Area | 2000 |
38 | 2015 | 45300 | Tampa-St. Petersburg-Clearwater, FL Metro Area | 2000 |
39 | 2010 | 47900 | Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area | 2000 |
40 | 2015 | 47900 | Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area | 2000 |
As before, we can use a beeswarm plot to plot the distribution of house values by metro area.
#make a beewarm plot:
ggplot(data=dt.samp,
aes(y=factor(year),
x=valueh,color=log(valueh)))+
geom_quasirandom(alpha=0.5,size=0.5)+
theme_minimal()+
scale_color_viridis(name="House Value\n$,log scale\n",discrete=F,option="D",end=0.95,direction=-1,
limits=c(log(10000),log(1.4e6)),
breaks=c(log(10000),log(100000),log(1e6)),
labels=c("$10k","$100k","$1,000k") ) +
scale_x_log10(limits=c(10000,1.4e6),breaks=c(10000,100000,1000000),
labels=c("$10k","$100k","$1,000k") )+
labs(y="",x="House Value ($, log scale)",
caption="@lenkiefer Source: Census 1-year American Community Survey (2010 & 2015),\nIPUMS-USA, University of Minnesota, www.ipums.org.",
title="House value distribution by Metro")+
theme(axis.text.x = element_text(size=6),
strip.text.x = element_text(size = 5),
plot.caption=element_text(hjust=0,vjust=1,margin=margin(t=10)),
plot.title=element_text(size=14),
legend.position = "top" )+
facet_wrap(~cbsa.name,ncol=4)+theme()
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Removed 76 rows containing missing values (position_quasirandom).
## Warning: Removed 98 rows containing missing values (position_quasirandom).
## Warning: Removed 69 rows containing missing values (position_quasirandom).
## Warning: Removed 100 rows containing missing values (position_quasirandom).
## Warning: Removed 53 rows containing missing values (position_quasirandom).
## Warning: Removed 93 rows containing missing values (position_quasirandom).
## Warning: Removed 112 rows containing missing values (position_quasirandom).
## Warning: Removed 224 rows containing missing values (position_quasirandom).
## Warning: Removed 109 rows containing missing values (position_quasirandom).
## Warning: Removed 77 rows containing missing values (position_quasirandom).
## Warning: Removed 150 rows containing missing values (position_quasirandom).
## Warning: Removed 75 rows containing missing values (position_quasirandom).
## Warning: Removed 98 rows containing missing values (position_quasirandom).
## Warning: Removed 80 rows containing missing values (position_quasirandom).
## Warning: Removed 165 rows containing missing values (position_quasirandom).
## Warning: Removed 359 rows containing missing values (position_quasirandom).
## Warning: Removed 93 rows containing missing values (position_quasirandom).
## Warning: Removed 69 rows containing missing values (position_quasirandom).
## Warning: Removed 97 rows containing missing values (position_quasirandom).
## Warning: Removed 94 rows containing missing values (position_quasirandom).
## Warning: Removed 7 rows containing missing values (geom_point).
We see quite a bit of variation across metro areas. But how does the distribution of house values compare to the distribution of incomes? Let’s look at Washington D.C. and plot estimates of homeowner household income versus the value of those homeowner’s homes. Note, we’ve already subsetted on homeowner households, so this metric is different from a more commonly house value to income ratio that uses all households. We are excluding renters from this analysis
ggplot(data=dt.x[year==2015 & met2013==47900,], aes(valueh,weight=hhwt))+
geom_density(alpha=0.5,aes(fill="House Value"))+
geom_density(aes(hhincome,fill="Household Income"),alpha=0.5)+
scale_color_manual(name="",values=c("black","black"))+
scale_fill_viridis(discrete=T,end=0.85,name="")+
scale_x_log10(label=scales::dollar,limits=c(10000,2e6))+
theme_minimal()+labs(x="Income and Home Values",y="",
title="House value and homeowner income distribution in Washington D.C. in 2015",
subtitle="kernel density estimates using household weights",
caption="@lenkiefer Source: Census 1-year American Community Survey (2015),\nIPUMS-USA, University of Minnesota, www.ipums.org.")+
theme(plot.title=element_text(size=14),legend.position="top",
axis.text.y=element_blank(),
plot.caption=element_text(hjust=0,vjust=1,margin=margin(t=10)))+
facet_wrap(~cbsa.name,scales="free_y")
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Removed 393 rows containing non-finite values (stat_density).
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning: Removed 261 rows containing non-finite values (stat_density).
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
And we can construct a small multiple:
ggplot(data=dt.x[year==2015 & met2013>=4900,], aes(valueh,weight=hhwt))+
geom_density(alpha=0.5,aes(fill="House Value"))+
geom_density(aes(hhincome,fill="Household Income"),alpha=0.5)+
scale_color_manual(name="",values=c("black","black"))+
scale_fill_viridis(discrete=T,end=0.85,name="")+
scale_x_log10(label=scales::dollar,limits=c(10000,3e6))+
theme_minimal()+labs(x="Income and Home Values",y="",
title="House value and homeowner income distribution by Metro in 2015",
subtitle="kernel density estimates using household weights",
caption="@lenkiefer Source: Census 1-year American Community Survey (2015),\nIPUMS-USA, University of Minnesota, www.ipums.org.")+
theme(plot.title=element_text(size=14),legend.position="top",
axis.text.y=element_blank(),
strip.text.x = element_text(size = 5),
plot.caption=element_text(hjust=0,vjust=1,margin=margin(t=10)))+
facet_wrap(~cbsa.name,scales="free_y",ncol=4)
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Removed 4313 rows containing non-finite values (stat_density).
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning: Removed 7994 rows containing non-finite values (stat_density).
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
But how does the income of each homeowner compare to the value of their home? Let’s take each homeowner in our sample and construct the ratio of their house value to their income.
dt.x<-dt.x[,pti:=valueh/hhincome] #price to income ratio
Now we can plot the distribution of this ratio for Washington D.C.:
ggplot(data=dt.x[year==2015 & met2013==47900,], aes(pti,weight=hhwt))+
geom_density(alpha=0.5,aes(fill="House value to income ratio"))+
#geom_density(aes(hhincome,fill="Household Income"),alpha=0.5)+
scale_color_manual(name="",values=c("black","black"))+
scale_fill_viridis(discrete=T,end=0.85,name="")+
scale_x_continuous(limits=c(0,20),breaks=seq(0,20,2))+
theme_minimal()+labs(x="House vaue to income ratio",y="",
title="Ratio of house value to homeowner income distribution in Washington D.C. in 2015",
subtitle="kernel density estimates using household weights",
caption="@lenkiefer Source: Census 1-year American Community Survey (2015),\nIPUMS-USA, University of Minnesota, www.ipums.org.")+
theme(plot.title=element_text(size=14),legend.position="none",
axis.text.y=element_blank(),
plot.caption=element_text(hjust=0,vjust=1,margin=margin(t=10)))+
facet_wrap(~cbsa.name,scales="free_y")
## Warning: Removed 600 rows containing non-finite values (stat_density).
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
The average ratio is about 4 in Washington D.C., but it varies a lot by metro area. Let’s create another small multiple.
ggplot(data=dt.x[year==2015 & met2013>=4900,], aes(pti,weight=hhwt))+
geom_density(alpha=0.5,aes(fill="House value to income ratio"))+
#geom_density(aes(hhincome,fill="Household Income"),alpha=0.5)+
scale_color_manual(name="",values=c("black","black"))+
scale_fill_viridis(discrete=T,end=0.85,name="")+
scale_x_continuous(limits=c(0,20),breaks=seq(0,20,2))+
theme_minimal()+labs(x="House vaue to income ratio",y="",
title="Ratio of house value to homeowner income distribution by metro in 2015",
subtitle="kernel density estimates using household weights",
caption="@lenkiefer Source: Census 1-year American Community Survey (2015),\nIPUMS-USA, University of Minnesota, www.ipums.org.")+
theme(plot.title=element_text(size=14),legend.position="none",
axis.text.y=element_blank(),
strip.text.x = element_text(size = 5),
plot.caption=element_text(hjust=0,vjust=1,margin=margin(t=10)))+
facet_wrap(~cbsa.name,scales="free_y",ncol=4)
## Warning: Removed 14571 rows containing non-finite values (stat_density).
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
## Warning in density.default(x, weights = w, bw = bw, adjust = adjust, kernel
## = kernel, : sum(weights) != 1 -- will not get true density
Let’s compute the weighted average ratio (excluding any values less than 0 or greater than 20) by metro area in 2010 and 2015:
dt.wsum<-dt.x[pti>0 & pti<20,list(pti.wm=weighted.mean(pti,hhwt,na.rm=T)),by=c("year","cbsa.name")]
ggplot(data=dt.wsum[year==2015,],
aes(x=pti.wm,y=reorder(cbsa.name,-pti.wm),
label=paste(" ",round(pti.wm,1),cbsa.name),
color=factor(year)))+
geom_point(size=3)+theme_minimal()+
geom_text(hjust=0)+
scale_x_continuous(limits=c(2.5,9),breaks=seq(2,8,1))+
scale_color_viridis(name="Year",discrete=T,end=0.85)+
theme(plot.title=element_text(size=14),legend.position="none",
axis.text.y=element_blank(),
plot.caption=element_text(hjust=0,vjust=1,margin=margin(t=10)))+
labs(x="Weighted average homeowner value to household income ratio",
y="Metro",
title="Ratio of house value to homeowner income distribution by metro in 2015",
caption="@lenkiefer Source: Census 1-year American Community Survey (2015),\nIPUMS-USA, University of Minnesota, www.ipums.org.")
This plot makes clear that there is a pretty wide variation in the average house value to household income ratio. We can look at the full distributions through an animated gif (using our usual tweenr approach).
library(tweenr)
library(animation)
library(tidyverse)
library(tidyr)
# Function for use with tweenr
myf<-function (m, yy=2015){
d.out<-copy(dt.samp)[met2013==m]
d.out %>% map_if(is.character, as.factor) %>% as.data.frame -> d.out
return(d.out)
}
# get list of metros from our summay data: metro.list
# Circle back to Atlanta (met2013==12060)
my.list2<-lapply(c(unique(metro.list$met2013),12060),myf)
#use tweenr to interploate
tf <- tween_states(my.list2,tweenlength= 3,
statelength=2, ease=rep('cubic-in-out',2),nframes=200)
tf<-data.table(tf) #convert output into data table
#Animate plot
oopt = ani.options(interval = 0.2)
saveGIF({for (i in 1:max(tf$.frame)) { #loop over frames
g<-
ggplot(data=tf[.frame==i & pti<= 20 & hhincome>10000 & valueh > 5000 ],
aes(y=factor(year),
x=pti,color=pti))+
geom_quasirandom(alpha=0.5,size=0.75)+
theme_minimal()+
scale_color_viridis(name="House value to\nhousehold income\nratio",
discrete=F,option="D",
limits=c(0,20),
end=0.95,direction=-1 #,
) +
#scale_x_log10(limits=c(10000,1.4e6),breaks=c(10000,100000,1000000),
# labels=c("$10k","$100k","$1,000k") )+
scale_x_continuous(limits=c(0,20),breaks=seq(0,20,2.5))+
theme(plot.title=element_text(size=14))+
theme(plot.caption=element_text(hjust=0,vjust=1,margin=margin(t=10)))+
theme(plot.margin=unit(c(0.25,0.25,0.25,0.25),"cm"))+
theme(legend.position = "top")+
labs(y="",x="House value to household income ratio",
caption="@lenkiefer Source: Census 1-year American Community Survey (2010 & 2015),\nIPUMS-USA, University of Minnesota, www.ipums.org.\nTo avoid extreme overplotting, 2000 observations sampled at random (using weights),\nonly includes homeowner households with > $10,000 income and an estimated house value > $5,000 & cases where ratio <= 20",
title="House value to income ratio distribution by Metro",
subtitle=head(tf[.frame==i,],1)$cbsa.name)+
theme(axis.text.x = element_text(size=8))+ #coord_flip()+
#facet_wrap(~year)+
theme(strip.text.x = element_text(size = 6))
print(g)
ani.pause()
print(i)}
},movie.name="tween acs value 04 16 2017 v2.gif",ani.width = 750, ani.height = 400)
Run it and you get: