Chapter 3 Session II - Maximum Likelihood Estimation (MLE)

3.1 Introduction

In this second session of the microeconometrics tutorial we are going to implement Maximum Likelihood Estimation in R. The essential steps are:

  1. Understand the intuition behind Maximum likelihood estimation. We awill replicate a Poisson regression table using MLE. We will both write our own custom function and a built-in one.

  2. Propose a model and derive its likelihood function. This part is not going to be very deep in the explanation of the model, derivation and assumptions. The aim of this sessions are on the estimation (computation) and not in the model per se. I will however refer the sources if you want to have a deeper look at the model. Given that the last session we did something on health economics, this time we change topic and will focus on labour economics. As anecdote, when I first saw this I was very impressed! I hope you are impressed too after today session!

  3. Use the Current Population Survey (CPS) and understand how to handle and manage this particular dataset.

    + Import the data
    + Use the specified columns
    + Clean data
  4. Estimate the structural parameters of the proposed model (both for the estimates and the standard errors, obtaind via the delta method).

3.2 Maximum likelihood estimator

For the theory please refer to the slides of the course (class 2). Here we just provide a brief definition and the intuition to the method application. What is Maximum likelihood estimation (MLE)? MLE is an estimation method in which we obtain the parameters of our model under an assumed statistical model and the available data, such that our sample is the most probable.

  • Given a statistical model (ie, an economic model with suitable stochastic features), select the parameters that make the observed data most probable. In this way, we are doing inference in the population that generated our data and the DGP behind. We can formulate any model and we will obtain a result; the only restriction for the formulation is that it has probability 0.

  • Even if it is intuitive, rely on the assumptions (model, statistical model, DGP), but not in the validity.

  • Validity of the models?

“A model is a deliberate abstraction from reality. One model can be better than another, on one or several dimensions, but none are correct. They help us focus on the small set of phenomena in which we are interested, and/or have data regarding. When correctly developed and explained, it should be clear what set of phenomena are being excluded from consideration, and, at the end of the analysis, it is desirable to say how the omission of other relevant phenomena could have affected the results attained.

An advantage of a structured approach to empirical analysis is that it should be immediately clear what factors have been considered exogenous and which endogenous, the functional form assumptions made, etc." (C. Flinn, lecture notes)

To understand how MLE works we will use two examples today: a Poisson regression and a structural estimation.

3.2.1 Poisson regression

In this section we are going to replicate a paper by Daniel Treisman published in AER 2016. It applies Poisson regression to the number of millioners in a set of countries, conditional on some country characteristics (Treisman 2016).

  • What does the paper says?

The main idea of the paper is provide a robust model to predict the number of rich in a given country, given an economic environment specific to said country.

In comparing data and predictions the Author finds that, regardless of the model specification, Russia reports the highest number of anomalies (underpredictions).

  • Why a Poisson regression?

In Treisman’s paper the dependent variable — the number of billionaires \(y_i\) in country \(i\) — is modelled as a function of GDP per capita, population size, and years membership in GATT and WTO. He also presents 4 alternative specifications.

“… since the dependent variable is a count, Poisson rather than OLS regression is appropriate.” Estimation

You can acces the data and the paper in the provided links. Download and unzip the information. The file also contains a companion STATA code to reproduce the tables in the paper. Unzip the information and load the dataset in R using the haven library (Wickham and Miller 2019):

data_mil <- read_dta("YOUR PATH GOES HERE")

To reproduce table 1 from the paper we need to filter the information for 2008, as it is the year considered in the main analisys.

data_mil  <- data_mil[data_mil$year == 2008,]

Our goal is to estimate a Poisson regression model and there are built-in functions to do these kind of estimations using a one-line command like glm(..., family = "poisson"). Our goal instead is to use Maximum Likelihood estimation to reproduce such parameters and understand how this works. In order to have a benchmark for comparison let’s see how the output of the first proposed model looks like:

summary(glm(formula = numbil0 ~ lngdppc + lnpop + gattwto08, family = poisson(link='log'), data = data_mil))
## Call:
## glm(formula = numbil0 ~ lngdppc + lnpop + gattwto08, family = poisson(link = "log"), 
##     data = data_mil)
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -8.7615  -0.7585  -0.3775  -0.1010   9.0587  
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -29.049536   0.638193 -45.518  < 2e-16 ***
## lngdppc       1.083856   0.035064  30.911  < 2e-16 ***
## lnpop         1.171362   0.024156  48.491  < 2e-16 ***
## gattwto08     0.005968   0.001908   3.127  0.00176 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Dispersion parameter for poisson family taken to be 1)
##     Null deviance: 5942.23  on 196  degrees of freedom
## Residual deviance:  669.95  on 193  degrees of freedom
##   (354 observations deleted due to missingness)
## AIC: 885.08
## Number of Fisher Scoring iterations: 5

Let’s start the construction of the maximum likelihood introducing the Poisson regression model. The starting assumptions are: Poisson regression is a generalized linear model form of regression analysis used to model count data \((1)\), in which the dependent variable has a Poisson distribution \((2)\), and the logarithm of the expected value can be modeled as a linear combination \((3)\).

  1. Count data: notice that all the numbers from a Poisson distribution are integers.
  2. Poisson pdf is defined as: \[f(k,\lambda)=Pr(X=k)=\frac{\lambda^k e^{-\lambda}}{k!}\] Keep in mind that the mean and variance of the Posson distribution are equal to the constant \(\lambda\).
  3. The mean can be expressed as a linear combination of the parameters. \[ \mu = E[y] = E[e^{y}] = E(y| \boldsymbol{X})=e^{\theta^{\prime} \boldsymbol{X}}=e^{\theta_0 + \theta_1 x_{i1}+ ... + \theta_k x_{ik} }\] Combining condition \((2)\) and \((3)\), we obtain the probability density function:

\[f(y_i,\lambda)=f(y_i,\mu)=\frac{\mu^y_i e^{-\mu}}{y_i!}\]

The joint probability density function is hence equal to:

\[f(\boldsymbol{Y}|\boldsymbol{X},\boldsymbol{\theta})=f(y_1,\mu) f(y_2,\mu)...f(y_n,\mu) = \prod_{i=1}^n f(y_i,\mu|\boldsymbol{X},\boldsymbol{\theta})\]

Our likelihood function instead takes as given the data (vector \(\boldsymbol{Y}\) and matrix \(\boldsymbol{X}\)) and calculates the likelihood given a parameter \(\theta\).

\[\mathcal{L}(\boldsymbol{\theta} | \boldsymbol{Y},\boldsymbol{X}) = \prod_{i=1}^n f(\boldsymbol{\theta}|\boldsymbol{X},\boldsymbol{Y}) = \prod_{i=1}^n \frac{\mu_i^{y_i} e^{-\mu_i}}{y_i!}\]

Finding the \(\boldsymbol{\hat{\theta}}\) that maximizes the likelihood function therefore boils down to estimate the model parameters. Before this step we monotically transform the obective function to feed the algorithm the log-likelihood instead of the function itself. Why? Think about derivatives of sums vs. derivatives of products. \[\boldsymbol{\hat{\theta}} = \max_{\boldsymbol{\theta}} log( \mathcal{L}(\boldsymbol{\theta} | \boldsymbol{Y},\boldsymbol{X})) = \min _{\boldsymbol{\theta}} - log( \mathcal{L}(\boldsymbol{\theta} | \boldsymbol{Y},\boldsymbol{X})) \] Taking the logarithm then we have:

\[\begin{equation} \begin{aligned} \log( \mathcal{L}(\boldsymbol{\theta} | \boldsymbol{Y},\boldsymbol{X})) &= \log \left( \prod_{i=1}^N \frac{\mu_i^y e^{-\mu}}{y_i!} \right) \\ &= \sum_{i=1}^N \log \left( \frac{\mu_i^{y_i} e^{-\mu_i}}{y_i!} \right)\\ &= \sum_{i=1}^N y_i \log \left( \mu_i \right) - \sum_{i=1}^N \left(\mu_i\right) -\sum_{i=1}^N \log \left(y_i! \right)\\ \end{aligned} \end{equation}\]


\[ \mu = e^{\theta^{\prime} \boldsymbol{X}}=e^{\theta_0 + \theta_1 x_{i1}+ ... + \theta_k x_{ik} }\].

Absent a closed-form solution, we are going to use the R optimizer to maximize the log-likelihood function above. The optim function will serve this purpose. According to the help vignette ?optim, we need:

  • A vector of initial values to start the search from.

  • A well specified function to minimize. It must take as input the vector of parameters and should return a scalar.

  • Method of optimization (?).

  • Optional: a hessian (boolean). We will use this to parse out the standard errors around the estimated parameters, so it will be useful later on.

In the last installment we introduced the basics of custom functions in R. In this tutorial we just recall as good practice that we are going to differenciate the inputs of the function: the parameters are the inputs that are going to change in the optimization process, while the data for example will be a static input. Another static input is the formula, a type of R object that will allow to extract the relevant information such as variables name, data columns, and instructs the algorithm on the relationship between variables (dependent or independent, interactions, dependance, etc).

We start by writing the function. We call the function ‘LLcalc’, and define the inputs in parentheses. One useful function that we use in the first line is model.matrix(). This function takes a formula and extract from the whole dataset the related matrix of observations including the vector of ones of the intercept, dummies, and interaction terms.

We are going to compute the value of \(\mu\) and then the value of the individual contribution to the log-likelihood. Then, we just sum and flip sign, as the optimizer minimizes by default. There is one technical detail that we are addressing using the package Rmpfr, which allow to store big numbers in 128 bits (as the factorail of 400) (Maechler 2019).

LLcalc <- function(theta, formula, data){
        ### Calculate the log-likelihood of a Poisson regression,
        ### given a vector of parameters (1),
        ### a relationship ()formula (2)
        ### and a dataset (type: dataframe)

        rhs <- model.matrix(formula, data = data)
        colnames(rhs)[1] <- "Intercept"
        Y <- as.matrix(data[rownames(rhs),toString(formula[[2]])])
        # Expected values \mu
        mu <- exp(rhs %*% theta)
        # Value of the log likelihood
        LL <- Y * log(mu)-mu-as.numeric(log(gamma(as(Y+1,"mpfr"))))

        return(-sum(LL, na.rm = T))


To check whether the function works properly we feed it the data, a random \(\theta\), and the formula of the first column of table I in the paper.

theta_test <- rnorm(4)
formula_1 <- as.formula(numbil0 ~ lngdppc + lnpop + gattwto08)
LLcalc(theta = theta_test, formula = formula_1, data = data_mil)
## [1] 26030.13

Now that we know is working, it is time to set up the optimizer. As discussed before we need the function, some starting parameters, the data and the formula, the static inputs of our function.

theta_0 <- c(-30,1,1,0)
dTbp_beta <- optim(par = theta_0, fn = LLcalc, data = data_mil, formula = formula_1, method = "Nelder-Mead", hessian = TRUE)
## [1] -29.052   1.084   1.171   0.006

To carry out the estimation we need to compute the standard errors. As a by-product, we also have the Hessian, this is useful in couple with the theory from the main class:

  • The negative of the hessian (provided as a linearization around the optimum of the LL maximization) is equal to the Fisher information matrix.

\[[\mathcal{I}(\theta)]_{i,j}= \mathbf{E}\left[\left(\frac{\partial}{\partial \theta_i} \log f(X|\theta)\right)\left(\frac{\partial}{\partial \theta_j} \log f(X|\theta)\right)|\theta\right]\approx\left[\frac{\partial^2}{\partial \theta_i \partial \theta_j} \log f(X|\theta)|\theta\right]\]

  • The Fisher information matrix, when inverted, is equal to the variance covariance matrix. (Formally, Cramer-Rao state that the inverse is the lower bound of the variance if the estimator is unbiased.)

  • The variance covariance matrix has in the diagonal the variane for each parameter. Taking square root of it gives the standard errors.

  • since we minimize we do not have to flip sign, as the values of the LL are calculated over the negative likelihood function.

Considering these notions we obtain the following standard errors that are equal to the previous regression.

fisher_info <- dTbp_beta$hessian
vcov <- solve(fisher_info)
se <- sqrt(diag(vcov))
## [1] 0.638233253 0.035068731 0.024155062 0.001907374

To complete the exercise we are going to reproduce the whole table 1 from the paper. We need to set up all of the formulas (models) estimated. The paper provides the robust errors, so we calculate them and format with stargazer (Hlavac 2018) output.

formula_2 <- as.formula(numbil0 ~  lngdppc + lnpop +  gattwto08 +  lnmcap08 +  rintr +  topint08)
formula_3 <- as.formula(numbil0 ~  lngdppc + lnpop +  gattwto08 +  lnmcap08 +  rintr +  topint08 +  nrrents +  roflaw)
formula_4 <- as.formula(numbil0 ~  lngdppc + lnpop +  gattwto08 +  lnmcap08 +  rintr +  topint08 +  nrrents +  roflaw +  fullprivproc)

r1 <- glm(formula = formula_1, family = poisson(link='log'), data = data_mil)
r2 <- glm(formula = formula_2, family = poisson(link='log'), data = data_mil)
r3 <- glm(formula = formula_3, family = poisson(link='log'), data = data_mil)
r4 <- glm(formula = formula_4, family = poisson(link='log'), data = data_mil)

se_rob <- list(sqrt(diag(sandwich::vcovHC.default(r1,type = "HC0"))),
               sqrt(diag(sandwich::vcovHC.default(r2,type = "HC0"))),
               sqrt(diag(sandwich::vcovHC.default(r3,type = "HC0"))),
               sqrt(diag(sandwich::vcovHC.default(r4,type = "HC0"))))
stargazer::stargazer(r1,r2,r3,r4, title = "Table 1 - Poisson regression",
          type=ifelse(knitr::is_latex_output(),"latex","html"), se = se_rob, out = "./images/RB_T_I.tex")
(#tab:)Table 1 - Poisson regression
Dependent variable:
(1) (2) (3) (4)
lngdppc 1.084*** 0.717*** 0.737*** 0.963***
(0.138) (0.244) (0.233) (0.243)
lnpop 1.171*** 0.806*** 0.929*** 1.153***
(0.097) (0.213) (0.195) (0.293)
gattwto08 0.006 0.007 0.004 0.0003
(0.007) (0.006) (0.006) (0.004)
lnmcap08 0.399** 0.286* 0.114
(0.172) (0.167) (0.237)
rintr -0.010 -0.009 -0.007
(0.010) (0.010) (0.009)
topint08 -0.051*** -0.058*** -0.060***
(0.011) (0.012) (0.015)
nrrents -0.005 0.013
(0.010) (0.013)
roflaw 0.203 0.342
(0.372) (0.283)
fullprivproc -0.002*
Constant -29.050*** -19.444*** -20.858*** -25.951***
(2.578) (4.820) (4.255) (6.240)
Observations 197 131 131 113
Log Likelihood -438.540 -259.731 -256.024 -179.661
Akaike Inf. Crit. 885.079 533.461 530.049 379.322
Note: p<0.1; p<0.05; p<0.01

3.3 Example II - Structural estimation

In this section the aim is to estimate the parameters from the likelihood function of a given model and be able to calculate it in the statistical software (in this case, R). We are going to estimate the structural parameters of a very simple search model (Flinn and Heckman 1982), following Flinn and Heckman 1982. This tutorial is not devoted to understanding such labor model, so we are going to describe the model and its assumptions; from that point we are going to derive the ML function. Then we are going to feed that function to the computer as in the previous case and maximize it to find the parameters of the model. The model:

Agents are well behaved and maximize the income they receive. When they are unemployed, they face a searching cost \(c\). Upon paying such cost, offers from a Poisson process will arrive. The arrival offer rate is denoted by \(\lambda\), and its probability by unit of time is also \(\lambda\). When they meet a wage is proposed from a log-normal distribution and the individuals can refuse to form the match or ‘seal the deal’. We also assume that the length of the employment spells follow a exponential distribution and that there is a constant risk of loosing the job with period probability \(\eta\). Time is discounted at a rate \(\rho\).

Such model is characterized by two Bellman equations. The first equation is the value of being employed [eq. (3.2) of the paper]:

\[ V_e(w) = \frac{1}{\rho}[w + (1-\eta)V_e + \eta V_u]\]

Reorganizing terms boils down to:

\[V_e(w) = \frac{1}{\rho + \eta}[w + \eta V_u ]\]

The second equation is the value of being unemployed:

\[V_u = \frac{1}{\rho}[c+(1-\lambda)V_u+ \lambda + \mathbf{E} \max \lbrace V_e, V_u \rbrace]\]

Which, after rearranging, is equal to:

\[\rho V_u = -c + \frac{\lambda}{\rho + \eta} \int_{\rho V_u}(w-\rho V_u)f(w)\]

These two equations describe the whole behaviour of the economy under the assumptions of the model. Now let’s list all the assumptions that we have until now, since they are going to be usefull for the construction of the likelihood function:

  • The minimum accepted wage is equal to \(w^* = \rho V_u\). In the paper the minimum accepted wage is not estimated, but instead is taken from the data.

  • Arrival rate of offers: \(\lambda\).

  • Termination rate: \(\eta\)

  • Distribution of employment duration: \(f[t_e]= \eta e^{(-\eta t_e)}\); the average duration is then \(\mathbf{E}[t_e]= \eta^{-1}\).

  • Rate of leaving the unemployed is equal to: \(\lambda (1- F(w^*))\). We are going to call this \(h_u\)

  • Distribution of unemployment duration is exponential \(f[t_u]= h_u e^{-h_u t_u)}\). The expected value in unemployment then is then equal to \((\lambda (1- F(w^*)))^{-1}=h_u^{-1}\)

  • The distribution of accepted wages is equal to: \(\frac{f(w)}{1 - F(w^*)}\). The term below is to adjust it to a distribution since a proper distribution must integrate to 1 and our is left censored.

  • The distribution of employments spells is right-censored. Given that it is negative exponential it coincides with the population.

Considering all this information, we can proceed to calculate the probability to sample an employed individual out of the population as weel as the probability to sample an unemployed individual. Let’s start with the latter:

\[P(U)=\frac{\mathbf{E}[t_u]}{\mathbf{E}[t_u]+\mathbf{E}[t_e]}=\frac{h_u^{-1}}{h_u^{-1}+\eta^{-1}}=\frac{\eta}{h_u+\eta}\] Now let’s derive the probability of sampling an employed individual:


Usually, data on the duration of unemployment \(t_u^o\) and the wages \(w^o\) are observed from census or surveys, hence we can calculate such parameters because now we can define the likelihood:

\[\begin{equation} \begin{aligned} \mathcal{L}&=\prod_U [P(U) \times f(t_u^o)]\times \prod_E [P(E) \times f(w^o)]=\\ &= \prod_U\left[ P(U) \times \lambda (1- F(w^*)) e^{- \lambda (1- F(w^*)) t_u} \right] \times \prod_E \left[P(E) \times \frac{f(w|\mu, \sigma)}{1 - F(w^*)} \right]=\\ &= \prod_U\left[\frac{\eta}{h_u+\eta} \times \lambda (1- F(w^*)) e^{- \lambda (1- F(w^*)) t_u} \right] \times \prod_E \left[\frac{h_u}{h_u+\eta} \times \frac{f(w|\mu, \sigma)}{1 - F(w^*)} \right]=\\ &= \prod_U\left[\frac{\eta}{\lambda (1- F(w^*))+\eta} \times \lambda (1- F(w^*)) e^{- \lambda (1- F(w^*)) t_u} \right] \times\\ &\prod_E \left[\frac{\lambda (1- F(w^*))}{\lambda (1- F(w^*))+\eta} \times \frac{f(w|\mu, \sigma)}{1 - F(w^*)} \right]\\ \end{aligned} \end{equation}\]

After taking logs and rearranging we obtain the following log-likelihood:

\[\log \mathcal{L} = N \log(h_u) - h_u \sum t_u + N_u \log(\eta) + \sum f(w|\mu,\sigma) - N_e \log(1-F(w^*)) - N_e \log(h_u + \eta)\]

We just need to feed this function to the optimizer with some data and we can obtain the parameters of the model. Before carrying out the liklihood estimation, we have a look at the data and try to parse out some useful extra information. The Data

To estimate the model we just need two vectors of data: duration of unemployment and hourly wages. To make a real example we are going to use the Current population survey (CSP), but most of the household surveys contain these kind of data (eg, colombian GEIH).

First we go to the page of the CPS, and learn about the nature of the data. We can also download the historical monthly data from the NBER webpage. For this exercise we are going to use the January 2019 data, which can be obtained following this link. Download and unzip the dataset in your working directory. Take also a moment to check the required variables, looking the documentation for the data. The open the dataset using the functionality of the reader package (Wickham, Hester, and Francois 2017).

data <- read_csv("jan19pub.dat", col_names = FALSE, trim_ws = FALSE)

As you see, the data only contain 1 vector of text – in jargon, observations are in long format. Each row is a long string of text. If you took the time to read the documentation you will see that you can find next to each variable the description and the location of the specific information in such long string. To extract information about unemployment duration and employment wages we need the following parts:


After identifying this information we can filter the data. To do so we create a database for the employed and a database for the unemployed, since the identification requires different information and variables. Let’s begin with the employed: we create a ‘data.frame’ containing the relevant colums and we extract the information by the position in the string.

employed <- data.frame(matrix(NA, nrow = nrow(data), ncol = 8))

colnames(employed) <- c(  "HRMIS",

employed$HRMIS          <- apply(data,1, function(x) as.numeric(substr(toString(x),63,64)))
employed$PEERNPER       <- apply(data,1, function(x) as.numeric(substr(toString(x),502,503)))
employed$PEERNHRO       <- apply(data,1, function(x) as.numeric(substr(toString(x),525,526)))
employed$PRERNWA        <- apply(data,1, function(x) as.numeric(substr(toString(x),527,534)))
employed$PRERNHLY       <- apply(data,1, function(x) as.numeric(substr(toString(x),520,523)))
employed$PTWK           <- apply(data,1, function(x) as.numeric(substr(toString(x),535,535)))
employed$PEHRUSL1       <- apply(data,1, function(x) as.numeric(substr(toString(x),218,219)))
employed$GESTFIPS       <- apply(data,1, function(x) substr(toString(x),93,94))

Reading the documentation we identify that the invalid cases are coded \(0\) or negative values \(-1,-2,\dots\). We reclassify this coded infromation as NA, a not available or missing value. We are going to keep the observations that are from the outgoing rotations (have earnings information). After that we are going to filter the valid hourly wages and convert the weekkly wages to hours using the number of hours. We are going to keep only the people that have this information. We are also going to take the right number of decimal for the variables that specify it. We are going to keep only the observations that are above the legal federal minimum wage of \(7.5\) USD, and we are going to trim the data at the percentile 99.9%.

employed[employed<=0] <- NA

employed <- employed[which(employed$HRMIS %in% c(4,8)),]
employed <- employed[which(employed$PEERNPER > 0),]
employed <- employed[-which(employed$PRERNHLY == 9999),]
employed$PRERNHLY <- employed$PRERNHLY/100
employed <- employed[-which(employed$PTWK == 1),]
employed$PRERNWA <- employed$PRERNWA/100

employed$wages <- ifelse(employed$PRERNHLY > 0 & !$PRERNHLY),
                         ifelse(!$PRERNWA) & !$PEHRUSL1),

employed <- employed[which(!$wages)),]
employed <- employed[which(employed$wages >= 7.25),]
employed <- employed[which(employed$wages <= quantile(employed$wages, 0.999)),]

tokeep_e <- c("GESTFIPS", "wages")
employed <- employed[,tokeep_e]
employed$duration_U <- 0

Now we apply the same selection to the unemployed: We collect the infromation in each variable following the position. We take the people for which the labor employment status is unemployment and that have information on the duration. Then we convert the information to monthly data.

unemployed <- data.frame(matrix(NA, nrow = nrow(data), ncol = 3))
colnames(unemployed) <- c("PEMLR",

unemployed$PEMLR          <- apply(data,1, function(x) as.numeric(substr(toString(x),180,181)))
unemployed$RUNEDUR        <- apply(data,1, function(x) as.numeric(substr(toString(x),407,409)))
unemployed$GESTFIPS       <- apply(data,1, function(x) substr(toString(x),93,94))

unemployed <- unemployed[which(unemployed$PEMLR %in% c(3,4)),]
unemployed <- unemployed[which(unemployed$RUNEDUR > 0),]
unemployed$duration_U <- unemployed$RUNEDUR/4.333
unemployed <- unemployed[,c("GESTFIPS","duration_U")]
unemployed$wages <- 0
unemployed <- unemployed[,c("GESTFIPS","wages", "duration_U")]

As a final step, we merge the two dataset:

data <- rbind(employed,unemployed)

We are going to select and calculate the MLE using only one state, as minimum wages laws vary locally. For this exercise we are going to use the information on Nevada (\(GESTFIPS = 32\))

data_sub <- data[which(data$GESTFIPS=="32"),] Estimation

In order to estimate the log likelihood we are going to code two auxiliary funcions that appear all the time in the procedure. The first one is the log-normal density function:

\[LN \left(x; \mu, \sigma\right) = \phi \left( \frac{\log(x) - \mu}{\sigma} \right)(\frac{1}{x \sigma})\] Where \(\phi\) is the probability density function of the \(N(0,1)\) distribution.

lognorm <- function(x,mu,sigma){
        res <- dnorm((log(x)-mu)/(sigma))*(1/(sigma*x))

The second function is the survival function, which is the probability to be over the minimum accepted wage for a given distribution. We define the survival as \((1- F(w^*))\). The cumulative distribution function of the log-normal is equal to:

\[\Phi \left( \frac{\log(x) - \mu}{\sigma} \right)\] Where \(\Phi\) is the cumulative distribution function of the standard normal distribution.

surv <- function(val,mu,sigma){
        res <- 1 - pnorm((log(val)-mu)/(sigma))

There are other built-in functions already developed that are suitable for this kind of problems. One xample is the mle2 function from the ‘bbmle’ package (Bolker and Team 2017). This is useful since the function deals with standard errors and provides other information that might be useful. There are some difference between the optim() method already covered and the mle2() function. This latter function requires:

  • Function to calculate negative log-likelihood

  • Starting values for the optimizer

  • The optimizer used

  • The data

Now we are going to code the log-likelihood function we recovered using the same procedure as in the previous example:

First we are going to code each of the parameters and the data as inputs in the function. Then we are going to code the likelihood using the definition derived earlier.

LLfnct_mle <- function(alambda,aeta,amu,asigma, data){

        lambda = exp(alambda)
        eta    = exp(aeta)
        mu     = amu
        sigma  = exp(asigma)

        w_star <- min(data[which(data$wages > 0),]$wages)

        h_u <- lambda * surv(w_star, mu, sigma)
        n <- nrow(data)
        n_u <-  nrow(data[which(data$duration_U > 0),])
        n_e <- (n-n_u)

        LL <-   n * log(h_u) -
                n_e * log(surv(w_star, mu, sigma)) -
                h_u * sum(data$duration_U) +
                n_u * log(eta) +
                sum(log(lognorm(data[which(data$wages>0),]$wages,mu,sigma))) -
                n_e * log(eta + h_u)

        #print(cbind(c("lambda = ","eta = ","mu = ","sigma = "),c(lambda,eta,mu,sigma)))


After that we just have to set up the maximizer and feed it the function:

m0 <- mle2(LLfnct_mle,
           start = list(alambda =-1,
                        aeta = -1,
                        amu = 3,
                        asigma = -1),
           data = list(data = data_sub),
           optimizer = "nlminb")

##    alambda       aeta        amu     asigma 
## -0.4541099 -1.8175680  2.7146674 -0.4361533

All the coeficients need to be transformed to recover and present the results. Before that we are going to calculate the standard errors using the delta method and the information from the hessian. At the end what we are doing is just rescaling the errors.

lambda <- exp(m0@coef["alambda"])
eta <- exp(m0@coef["aeta"])
mu <- m0@coef["amu"]
sigma <- exp(m0@coef["asigma"])

fisher_info <- m0@details$hessian
vcov_mle <- solve(fisher_info)
prop_sigma<-sqrt(c(lambda^2,eta^2,mu^2,sigma^2) * diag(vcov_mle))
names(prop_sigma) <- c("lambda","eta","mu","sigma")
stargazer(cbind("parameter" = names(prop_sigma), "theta" =c(lambda,eta,mu,sigma), prop_sigma),
          title = "Model parameters MLE - Nevada (Jan 2019)",
          type=ifelse(knitr::is_latex_output(),"latex","html"), out = "./images/SE_Nevada.tex")
(#tab:)Model parameters MLE - Nevada (Jan 2019)
parameter theta prop_sigma
alambda lambda 0.635012931441372 0.0900528154150777
aeta eta 0.16242027479781 0.0379917018868457
amu mu 2.71466743713899 0.270516421768113
asigma sigma 0.646518617780906 0.0678137201168286

3.4 References

This material was possible thanks to the slides of David MARGOLIS (in PSE resources), the course of C. FLinn at CCA 2017, and QuantEcon MLE.

This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the in the upper right hand corner of the page

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.