Things that can be done with Text Corpora II: Zipf’s Law
If we count up how often each word type of a language occurs in a large corpus and then list the words in order of their frequency of occurrence, we can explore the relationship between the frequency of a word, f, and its position in the list, known as its rank, r.
Zipf’s Law says that: f ? 1/r
Significance of Zipf’s Law: For most words, our data about their use will be exceedingly sparse. Only for a few words will we have a lot of examples.