date: 2019-01-30
tags: machine-learning math
It is really common to use maximum likelihood estimator (MLE) in machine learning. But do you ever think about the reason? Apart from the fact that it is prevalent and simple, here is why MLE is a very nice estimator.
Before discussing whether MLE is a good estimator, we need to come up with a criteria. And one of the most popular measure of a estimator is the mean square error (MSE). Here is the definition of MSE:
where is the parameter of the statistical model. And from some trivial derivation, we could have a useful decomposition.
where
And if bias is 0, the estimator is called unbiased.
And there are also a definition for MLE. First, the likelihood function is:
And the MLE of is
And for the MLE, there is a hard theorem
Theorem
If L is smooth and behave in a nice way (here I omitted the strict conditions), it would
where is the Fisher Information matrix.
In a word, MLE is consistent and asymptotic normal. For the prove, please visit here: prove.
And there is another theorem gives a lower bound of the variance of an estimator.
Theorem Cramer-Rao lower bound
Let be an i.i.d. sample of random variables with density or frequency function and assume
then for an unbiased estimator of , we will have
As , MLE would be unbiased (because it is consistent) and therefore the asymptotic optimal estimator. In practice, we could believe that MLE is very good when the sample size is large enough.