Preprocessing
Before the filters or partial taggers are applied, the text is tokenized, lemmatized, split into sentences and part-of-speech tagged (by the Brill Tagger).
Proper names are marked and categorized.
Content words are identified and only they are disambiguated.