Maybe the Linear Probability Model isn't all bad

The Linear Probability Model (LPM) might be bad, but is it all bad? Let's look at some conditions where the LPM might not be so bad. We'll also look at some simple adjustments that might improve the performance of the LPM. We'll also compare the LPM to some common alternatives.

The Linear Probability Model (LPM) might be bad, but is it all bad? Let’s look at some conditions where the LPM might not be so bad. We’ll also look at some simple adjustments that might improve the performance of the LPM. We’ll also compare the LPM to some common alternatives.

Setup

Throughout most of this post, we’re going to consider a world where the LPM model is the true model. That is:

\[ \begin{cases} Pr(y_i =1 | x_i^T\beta>1) = 1, \\Pr(y_i=1|x_i^T\beta \in[0,1],= x_i^T\beta, \\Pr(y_i =1 | x_i^T\beta<0)=0 \end{cases}\] To keep things simple, we’ll start with a single regressor and an intercept so that \(x_i^T=[1,x_{1,i}]\). Furthermore we’ll suppose that our regressor has a standard normal distribution \(x_{1,i} \sim N(0,1)\).

With that, we’re ready to go. Throughout we’ll use R to simulate and draw pictures. Let’s start with a simple one describing our setup.

First load required libraries.

# load libraries ----
suppressPackageStartupMessages({
library(tidyverse)  # for pictures
library(mfx)        # for marginal effects
library(extrafont)  # for fonts
library(ggridges)   # for theme_ridges
library(broom)      # for model cleanup
library(tidyr)      # keep things tidy
})

Draw a picture

# set up simulation  ----
set.seed(20180721)  # seed for rng 
beta_0  <- 0.4      # intercept
beta_x1 <- 0.4      # coefficient on regressor
Nsim    <- 1000     # observations for simulations

#helper function to truncate values at 0,1
 myf<- function(x){ case_when(
    x < 0  ~ 0,
    x >  1 ~ 1,
    TRUE ~ x)}

# draw picture

df.sim <- data.frame(x=seq(-2,2,.1)) %>%
  mutate( py=myf(beta_0+beta_x1*x))

ggplot(data=df.sim, aes(x=x,y=py))+
  geom_line(color="#27408b",size=1.1)+
  theme_ridges(font_family="Roboto")+
  scale_x_continuous(breaks=c(-1,0,1.5))+
  labs(x="x",y="Pr(Y=1|x)",
       title="Linear Probability Model",
       subtitle=bquote(paste("Pr(Y=1|X) = ",beta[0]+beta[x],"x, ",beta[0]," = 0.4, ", beta[x], " = 0.4")))

Bias and inconsistency in the LPM

As detailed in Horrace and Oaxaca (2006) the LPM is biased and inconsistent. Last time we replicated their results. Let’s look a little closer.

We’ll draw a bunch of samples and compare the simple Ordinary Least Squares estimator for the LPM to a trimmed estimator. For the trimmed estimator, we’ll first run OLS and then re-run it using only the observations where \(x_i^T\hat{\beta} \in (0,1)\). Let’s do that for several possible values for \(\beta_x\), leaving the intercept fixed (\(\beta_0=0.4\)).

Let’s draw samples of various sizes and let \(\beta_x\) range from -1 to 1.

reg.df <- 
  expand.grid(N=c(25,100,1000),       # samples of size 25, 100, 1000
              beta_hat=seq(-1,1,.1),  # beta from -1 to 1
              trials=1:100)           # we will estimate each model 100 times

We’ll also need to construct helper functions to estimate the OLS and the trimmed model.

  myreg <- function(N=10,
                    beta_x=0.4,
                    beta_0=0.4)
  {
    X <- rnorm(N)
    z <- beta_0+beta_x*X
    Z0 <- myf(z) 
    y <-  rbinom(N, size=1, prob=Z0)
    df.reg <- data.frame(y=y,x=X)
    broom::tidy(lm(data=df.reg, formula= y~x))
  }
  
  myreg.trim <- function(N=10,beta_x=.04,beta_0=0.4){
    X <- rnorm(N)
    z <- beta_0+beta_x*X
    Z0 <- myf(z) 
    y <-  rbinom(N, size=1, prob=Z0)
    df.reg <- data.frame(y=y,x=X)
    #first stage
    out.lm <- lm(data=df.reg, formula= y~x)
    # trimmed estimator
    N.trim <- sum(predict(out.lm)>=0 & predict(out.lm)<=1 )
    df.reg.trim <- df.reg[(predict(out.lm)>=0 & predict(out.lm)<=1 ),]
    out <-     broom::tidy(lm(data=df.reg.trim, formula= y~x))
    out$N.trim <- N.trim  # count number of observations kept
    return(out)
  }

Now with things set up we can run our simulations.

  reg.out <- reg.df  %>% 
  mutate(lm=map2(N,beta_hat,myreg)) %>% 
  mutate(lm.trim=map2(N,beta_hat,myreg.trim)) %>% 
  unnest(lm,lm.trim)   

Take a look at what we have:

str(reg.out)
## 'data.frame':    12600 obs. of  14 variables:
##  $ N         : num  25 25 100 100 1000 1000 25 25 100 100 ...
##  $ beta_hat  : num  -1 -1 -1 -1 -1 -1 -0.9 -0.9 -0.9 -0.9 ...
##  $ trials    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ term      : chr  "(Intercept)" "x" "(Intercept)" "x" ...
##  $ estimate  : num  0.439 -0.586 0.484 -0.376 0.464 ...
##  $ std.error : num  0.0589 0.0814 0.0314 0.0301 0.0104 ...
##  $ statistic : num  7.46 -7.19 15.41 -12.5 44.46 ...
##  $ p.value   : num  1.40e-07 2.52e-07 6.28e-28 5.27e-22 6.28e-239 ...
##  $ term1     : chr  "(Intercept)" "x" "(Intercept)" "x" ...
##  $ estimate1 : num  0.488 -0.634 0.39 -0.596 0.427 ...
##  $ std.error1: num  0.086 0.1468 0.0346 0.0492 0.011 ...
##  $ statistic1: num  5.67 -4.32 11.27 -12.11 38.82 ...
##  $ p.value1  : num  3.45e-05 5.33e-04 4.88e-18 1.33e-19 2.77e-186 ...
##  $ N.trim    : int  18 18 80 80 803 803 19 19 83 83 ...
(head(reg.out))
##      N beta_hat trials        term   estimate  std.error  statistic
## 1   25       -1      1 (Intercept)  0.4390683 0.05886360   7.459079
## 2   25       -1      1           x -0.5859318 0.08143720  -7.194892
## 3  100       -1      1 (Intercept)  0.4835500 0.03138585  15.406626
## 4  100       -1      1           x -0.3763385 0.03010846 -12.499429
## 5 1000       -1      1 (Intercept)  0.4642779 0.01044219  44.461753
## 6 1000       -1      1           x -0.3700670 0.01032677 -35.835717
##         p.value       term1  estimate1 std.error1 statistic1      p.value1
## 1  1.395294e-07 (Intercept)  0.4877908 0.08595734   5.674801  3.449042e-05
## 2  2.517679e-07           x -0.6337525 0.14684597  -4.315763  5.328698e-04
## 3  6.284317e-28 (Intercept)  0.3896831 0.03457798  11.269689  4.875296e-18
## 4  5.269242e-22           x -0.5961138 0.04921946 -12.111342  1.332243e-19
## 5 6.281201e-239 (Intercept)  0.4270474 0.01099933  38.824876 2.774408e-186
## 6 1.881766e-181           x -0.5965633 0.01676596 -35.581812 4.587302e-167
##   N.trim
## 1     18
## 2     18
## 3     80
## 4     80
## 5    803
## 6    803

We’ve stacked up estimates across 100 trials for ols (estimate) and trimmed ols (estimate1). Let’s summarize:

  reg.tab<-
    reg.out %>% group_by(N,beta_hat,term) %>% 
    summarize(ols.mean=mean(estimate),
              trimmed.ols.mean=mean(estimate1),
              ols.median= median(estimate),
              ols.trim.median = median(estimate1),
              N.trim=mean(N.trim))

(reg.tab %>% filter(term=="x") %>% mutate_if(is.numeric,round,3))
## # A tibble: 63 x 8
## # Groups:   N, beta_hat [63]
##        N beta_hat term  ols.mean trimmed.ols.mean ols.median
##    <dbl>    <dbl> <chr>    <dbl>            <dbl>      <dbl>
##  1    25   -1     x       -0.398           -0.618     -0.388
##  2    25   -0.9   x       -0.388           -0.59      -0.376
##  3    25   -0.8   x       -0.389           -0.571     -0.377
##  4    25   -0.7   x       -0.385           -0.555     -0.38 
##  5    25   -0.6   x       -0.363           -0.517     -0.363
##  6    25   -0.5   x       -0.349           -0.478     -0.348
##  7    25   -0.400 x       -0.306           -0.392     -0.309
##  8    25   -0.300 x       -0.277           -0.331     -0.286
##  9    25   -0.200 x       -0.195           -0.208     -0.205
## 10    25   -0.100 x       -0.1             -0.08      -0.11 
## # ... with 53 more rows, and 2 more variables: ols.trim.median <dbl>,
## #   N.trim <dbl>

A few graphs might help us understand what we have:

g1<-
  ggplot(data=filter(reg.tab,term=="x"), 
         aes(x=beta_hat,y=abs(ols.median-beta_hat),
             color=factor(N),
             shape=factor(N)))+
  geom_path()+
  geom_point()+
  theme_ridges(font_family="Roboto")+
  scale_color_manual(values=c("#27408b",rgb(103,180,75, maxColorValue = 256),"#f37735"),name="Sample size ")+
  scale_shape_manual(name="Sample size ",values=c(16,21,17))+
  theme(legend.position="top")+
  facet_wrap(~term,ncol=1)+
  labs(x=bquote(beta[x]),y=bquote(paste("absolute error: abs(",hat(beta[ols])," - ",beta[x],")")),
       title="Mean Absolute Error: Estimating LPM with OLS (n=100 trials)",
       subtitle=bquote(paste("Pr(Y=1|X) = ",beta[0]+beta[x],"x, ",beta[0]," = 0.4")))
g1

g2<-
  ggplot(data=filter(reg.tab,term=="x"), 
         aes(x=beta_hat,y=abs(ols.trim.median -beta_hat),
             color=factor(N),group=N,shape=factor(N)))+
  geom_path()+
  geom_point()  +
  theme_ridges(font_family="Roboto")+
  scale_color_manual(values=c("#27408b",rgb(103,180,75, maxColorValue = 256),"#f37735"),name="Sample size ")+
  scale_shape_manual(name="Sample size ",values=c(16,21,17))+
  theme(legend.position="top")+
  facet_wrap(~term,ncol=1)+
  labs(x=bquote(beta[x]),y=bquote(paste("absolute error: abs(",hat(beta[ols.trim])," - ",beta[x],")")),
       title="Mean Absolute Error: Estimating LPM with trimmed OLS (n=100 trials)",
       subtitle=bquote(paste("Pr(Y=1|X) = ",beta[0]+beta[x],"x, ",beta[0]," = 0.4")))
g2

These plots show that as \(\beta_x\) gets larger in absolute value the mean absolute error increases for both OLS and trimmed OLS. Trimmed OLS does better by a rather large margin:

# organize data, keep only medians
  pdata<-
  filter(reg.tab,term=="x") %>%     
  dplyr::select(N,beta_hat,ols.median,ols.trim.median) %>% 
  group_by(N,beta_hat) %>% 
  gather(type,est,-N,-beta_hat) %>% 
  ungroup() %>%
  # compute probabilities of observing xbeta <0 (pi_hat), >1 (rho_hat), in (0,1) (gamma_hat)
  mutate(rho_hat = 1-pnorm(1-beta_0,0,abs(beta_hat)),
         pi_hat = pnorm(-beta_0,0,abs(beta_hat)),
         gamma_hat = 1-rho_hat-pi_hat)
  

  ggplot(data=pdata, aes(x=beta_hat, y=abs(beta_hat-est), color=type, shape=type))+ 
    geom_point()  +
    facet_wrap(~forcats::fct_reorder(paste0("N = ",N),N),ncol=1)+
    geom_path(alpha=0.75)+
    theme_ridges(font_family="Roboto")+
    scale_color_manual(values=c("#27408b",rgb(103,180,75, maxColorValue = 256),"#f37735"),labels=c("OLS","Trimmed OLS"))+
    scale_shape_manual(labels=c("OLS","Trimmed OLS"),values=c(16,21))+
    theme(legend.position="top")+
    labs(x=bquote(beta[x]),y=bquote(paste("absolute error: abs(",hat(beta[x])," - ",beta[x],")")),
       title="Mean Absolute Error: Estimating LPM with OLS (n=100 trials)",
       subtitle=bquote(paste("Pr(Y=1|X) = ",beta[0]+beta[x],"x, ",beta[0]," = 0.4")))

Another way of looking at these results is to compare the mean absolute errror to the likelihood of observing \(x^T\beta \in (0,1) = \gamma\) (using the notation we borrowed from Horrace and Oaxaca).

  ggplot(data=filter(pdata,beta_hat>0),  #restrict to beta_hat > 0 (symmetric, so beta_hat<0 gives similar results)
         aes(x=gamma_hat, y=abs(beta_hat-est)/abs(beta_hat), color=type))+ 
  geom_point()  +
  facet_wrap(~forcats::fct_reorder(paste0("Sample size = ",N),N))+
  geom_path(alpha=0.75)+
   theme_ridges(font_family="Roboto")+
    scale_color_manual(values=c("#27408b",rgb(103,180,75, maxColorValue = 256),"#f37735"),labels=c("OLS","Trimmed OLS"))+
    scale_shape_manual(labels=c("OLS","Trimmed OLS"),values=c(16,21))+
    theme(legend.position="top")+
    labs(x=bquote(paste(gamma, "= Pr(",x^T, beta," in (0,1))")),y=bquote(paste("absolute error: abs(",hat(beta[x])," - ",beta[x],")")),
       title="Mean Absolute Error: Estimating LPM with (trimmed) OLS (n=100 trials)",
       subtitle=bquote(paste("Pr(Y=1|X) = ",beta[0]+beta[x],"x, ",beta[0]," = 0.4, ",beta[x]," > 0, X ~ N(0,1)")))

We can see that the trimmed estimator does much better than regular OLS, especially as the likelihood of getting a value outside of the (0,1) range increases.

Alternative estimators

Many times when you have binary outcomes researchers will use a probit or logit model.

What if we applied these approaches when the true model was a LPM? In reality we don’t know which model our data came from, but let’s just roll with this for a while.

We can easily fit a probit or logit model using glm.

# Simulate under LPM ----
  
# draw some Xs
  X <- rnorm(Nsim)  
  z <- beta_0+beta_x1*X
  Z0 <- myf(z) 
  y <-  rbinom(Nsim, size=1, prob=Z0)
  df.reg <- data.frame(y=y,x=X)

  # Fit Probit and Logit
  out.probit <- glm(data=df.reg, formula= y~x, family = binomial(link = "probit"))
  out.logit <- glm(data=df.reg, formula= y~x, family = binomial(link ="logit"))
  
  my.probit <- function(in.x=0.5, in.model=out.probit){
    predict(in.model, newdata=list(x=in.x),type="response")
  }
  
  my.logit<- function(in.x=0.5, in.model=out.logit){
    predict(in.model, newdata=list(x=in.x),type="response")
  }
  
  

  df.test <- 
    data.frame(x=seq(-2,2.5,.1)) %>%   # select so mean(x) =0.25 where Pr(Y|X =0.5)
    mutate(lpm=beta_0+beta_x1*x, 
           probit=unlist(map(x,my.probit)),
           logit=unlist(map(x,my.logit))) 
  
  ggplot(data=df.test, aes(x=x,y=probit, color="probit"))+
    geom_point()+
    geom_path()+
    geom_path(aes(color="logit",y=logit),linetype=3)+
    geom_point(aes(color="logit",y=logit),shape=21)+
    geom_path(linetype=2, alpha=0.75, aes(y=myf(lpm),color=" LPM"),size=1.1)+
    scale_x_continuous(breaks=c(-1,0,0.25,1.5))+
    theme_ridges(font_family="Roboto")+
    scale_color_manual(name="",values=c("#27408b",rgb(103,180,75, maxColorValue = 256),"#f37735"))+
    scale_shape_manual(name="",values=c(16,21))+
    theme(legend.position="top")+
    labs(x="X",y="Pr(Y=1 |X) ",
       title=paste0("Linear Probability Model\nestimated response using logit and probit (N=",Nsim,")"),
       subtitle=bquote(paste("Pr(Y=1|X) = ",beta[0]+beta[x],"x, ",beta[0]," = 0.4, ", beta[x], " = 0.4")))

The logit and probit fit a classical S curve and do a reasonable job of picking up the shape of the “true” LPM, though the slope of the S curves tend to be a little steeper around the midpoint (X=0.25 using our example).

We can see this using the mfx package to compute marginal effects.

# Logit marginal effects (at mean of x)
logitmfx(formula = y~x, data = df.reg, atmean = TRUE)
## Call:
## logitmfx(formula = y ~ x, data = df.reg, atmean = TRUE)
## 
## Marginal Effects:
##      dF/dx Std. Err.      z     P>|z|    
## x 0.475369  0.029605 16.057 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Probit marginal effects (at mean of x)

probitmfx(formula = y~x, data = df.reg, atmean = TRUE)
## Call:
## probitmfx(formula = y ~ x, data = df.reg, atmean = TRUE)
## 
## Marginal Effects:
##      dF/dx Std. Err.      z     P>|z|    
## x 0.464814  0.026505 17.537 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The estimated marginal effects are larger than the true marginal effect of 0.4. Do the models get it right on average? Let’s compute the average partial effects by setting atmean=False.

# Logit average marginal effects
logitmfx(formula = y~x, data = df.reg, atmean = FALSE)
## Call:
## logitmfx(formula = y ~ x, data = df.reg, atmean = FALSE)
## 
## Marginal Effects:
##     dF/dx Std. Err.      z     P>|z|    
## x 0.30175   0.03145 9.5949 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Probit average marginal effects

probitmfx(formula = y~x, data = df.reg, atmean = FALSE)
## Call:
## probitmfx(formula = y ~ x, data = df.reg, atmean = FALSE)
## 
## Marginal Effects:
##       dF/dx Std. Err.      z     P>|z|    
## x 0.3057193 0.0073981 41.324 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These tables show what the chart above shows visually. The logit and probit are steeper at the middle (around X=0.25) but are on average flatter than the “true” LPM.

Turn it around

How about if we turn it around? Suppose the probit was the true model, but we estimated a LPM on it. Let’s see what we get.

We’ll use the estimated coefficients from the probit above to simulate data.

beta_probit <- tidy(out.probit)$estimate

ystar <- beta_probit[1]+beta_probit[2]*X

y.probit <- ifelse(beta_probit[1]+beta_probit[2]*X-rnorm(Nsim,0,1)>0,1,0)

df.sim3 <- data.frame(x=X,y=y.probit)

# OLS
(lpm.sim3 <- lm(data=df.sim3, formula=y~x))
## 
## Call:
## lm(formula = y ~ x, data = df.sim3)
## 
## Coefficients:
## (Intercept)            x  
##      0.4253       0.2968
# Trimmed OLS
df.sim3.trim <- df.sim3[(predict(lpm.sim3)>=0 & predict(lpm.sim3 )<=1 ),]

(lpm.sim3.trim <-lm(data=df.sim3.trim, formula= y~x))
## 
## Call:
## lm(formula = y ~ x, data = df.sim3.trim)
## 
## Coefficients:
## (Intercept)            x  
##      0.4124       0.3593
  my.lpm<- function(in.x=0.5, in.model=lpm.sim3){
    predict(in.model, newdata=list(x=in.x),type="response")
  }

  
  df.test3 <- 
    data.frame(x=seq(-2,2.5,.1)) %>%   # select so mean(x) =0.25 where Pr(Y|X =0.5)
    mutate(lpm=unlist(map(x,my.lpm)),
           lpm.trim=unlist(x,my.lpm,lpm.sim3.trim), 
           probit=unlist(map(x,my.probit))
    )
  
  ggplot(data=df.test3, aes(x=x,y=probit, color="Probit",shape="Probit"))+
    geom_point()+
    geom_path()+
    geom_path(linetype=3, alpha=0.75, aes(y=myf(lpm.trim),color=" LPM Trimmed"),size=1.1)+
    geom_point(aes(y=myf(lpm.trim),color=" LPM Trimmed", shape=" LPM Trimmed"))+
    geom_path(linetype=2, alpha=0.75, aes(y=myf(lpm),color=" LPM"),size=1.1)+
    geom_point(aes(y=myf(lpm),color=" LPM", shape=" LPM"))+
    theme_ridges(font_family="Roboto")+
    scale_color_manual(name="",values=c("#27408b",rgb(103,180,75, maxColorValue = 256),"#f37735"))+
    scale_shape_manual(name="",values=c(16,21,17))+
    theme(legend.position="top")+
    labs(x="X",y="Pr(Y=1 |X) ",
       title=paste0("Probit Model\nestimated response using LPM (N=",Nsim,")"),
       subtitle=bquote(paste("Pr(Y=1|X) = ",Phi(beta[0]+beta[x]*X),", ",beta[0]," = ",  .(round(beta_probit[1],3)) , ", ", beta[x]," = ", .(round(beta_probit[2],3)))))

So the LPM looks like it’s “in the ballpark” of where the “true” model is. Though now the LPM is estimating too flat of a slope near the mean and too steep of a slope on average. However, the trimmed OLS estimator looks terrible.

Let’s add something to the plot to help see what’s going on:

  ggplot(data=df.test3, aes(x=x,y=probit, color="Probit",shape="Probit"))+
    geom_point()+
    geom_path()+
    geom_path(linetype=3, alpha=0.75, aes(y=myf(lpm.trim),color=" LPM Trimmed"),size=1.1)+
    geom_point(aes(y=myf(lpm.trim),color=" LPM Trimmed", shape=" LPM Trimmed"))+
    geom_path(linetype=2, alpha=0.75, aes(y=myf(lpm),color=" LPM"),size=1.1)+
    geom_point(aes(y=myf(lpm),color=" LPM", shape=" LPM"))+
    geom_rug(data=df.sim3,color="#27408b",inherit.aes=F, aes(x=x),sides="b",size=0.5,alpha=0.5)+
    geom_rug(data=df.sim3.trim,color=rgb(103,180,75, maxColorValue = 256),inherit.aes=F, aes(x=x),sides="t",size=0.5,alpha=0.5)+
    theme_ridges(font_family="Roboto")+
    scale_color_manual(name="",values=c("#27408b",rgb(103,180,75, maxColorValue = 256),"#f37735"))+
    scale_shape_manual(name="",values=c(16,21,17))+
    theme(legend.position="top")+
    labs(x="X",y="Pr(Y=1 |X) ",
       title=paste0("Probit Model\nestimated response using LPM (N=",Nsim,")"),
       subtitle=bquote(paste("Pr(Y=1|X) = ",Phi(beta[0]+beta[x]*X),", ",beta[0]," = ",  .(round(beta_probit[1],3)) , ", ", beta[x]," = ", .(round(beta_probit[2],3)))),
       caption="Strip at top shows Xs used by trimmed estimater\nStrip at bottom shows full sample")

Let’s convert the discrete Y values into something continuous by bucketing the Xs and computing averages.

  df.s <-
  df.sim3 %>% 
    mutate(xb=cut(x,seq(-3,3,.25))) %>%
    group_by(xb) %>% 
    summarize(x=mean(x),
              ybar=mean(y)) %>%
    mutate(
      lpm=unlist(map(x,my.lpm)),
      lpm.trim=unlist(x,my.lpm,lpm.sim3.trim),
      y.probit=unlist(map(x,my.probit)))

  
  ggplot(data=df.s %>% arrange(x), aes(x=x,y=y.probit, color="Probit",shape="Probit"))+
    geom_point()+
    geom_path()+
    geom_path(linetype=2, alpha=0.95, aes(y=ybar,color="Sample Average"))+
    geom_point(alpha=0.95, aes(y=ybar,color="Sample Average", shape="Sample Average"))+
    geom_path(linetype=3, alpha=0.75, aes(y=myf(lpm.trim),color=" LPM Trimmed"),size=1.1)+
    geom_point(aes(y=myf(lpm.trim),color=" LPM Trimmed", shape=" LPM Trimmed"))+
    geom_path(linetype=2, alpha=0.75, aes(y=myf(lpm),color=" LPM"),size=1.1)+
    geom_point(aes(y=myf(lpm),color=" LPM", shape=" LPM"))+
        theme_ridges(font_family="Roboto")+
    scale_color_manual(name="",values=c("#27408b",rgb(103,180,75, maxColorValue = 256),"#f37735","black"))+
    scale_shape_manual(name="",values=c(16,21,17,5))+
    theme(legend.position="top")+
    labs(x="X",y="Pr(Y=1 |X) ",
       title=paste0("Probit Model\nestimated response using LPM (N=",Nsim,")"),
       subtitle=bquote(paste("Pr(Y=1|X) = ",Phi(beta[0]+beta[x]*X),", ",beta[0]," = ",  .(round(beta_probit[1],3)) , ", ", beta[x]," = ", .(round(beta_probit[2],3)))),
       caption="Sample averages computed using X's binned by 0.25 increments")

Summary

We saw that when the true model is a LPM, trimming can significantly improve the estimator by reducing bias. When the true model is not a LPM, but something else, the benefits of trimming aren’t as apparent. In our case, when the true model was a probit, the trimmed OLS estimator didn’t do so well.

How might these approaches work when faced with actual data? We’ll take a look next time.

References

Earlier Post: “How bad is a Linear Probability Model?” LINK

Amemiya, T. (1977). Some theorems in the linear probability model. International Economic Review, 645-650. JSTOR

Horrace, W. C., & Oaxaca, R. L. (2006). Results on the bias and inconsistency of ordinary least squares for the linear probability model. Economics Letters, 90(3), 321-327. pdf