Looking at Text II: Tokenization --What is a Word?
An early step of processing is to divide the input text into units called tokens where each is either a word or something else like a number or a punctuation mark.
Periods: haplologies or end of sentence?
Homographs --> two lexemes