Statistical Estimators II: Maximum Likelihood Estimation
PMLE(w1,..,wn)=C(w1,..,wn)/N, where C(w1,..,wn) is the frequency of n-gram w1,..,wn
PMLE(wn|w1,..,wn-1)= C(w1,..,wn)/C(w1,..,wn-1)
This estimate is called Maximum Likelihood Estimate (MLE) because it is the choice of parameters that gives the highest probability to the training corpus.
MLE is usually unsuitable for NLP because of the sparseness of the data ==> Use a Discounting or . Smoothing technique.