Things that can be done with Text Corpora I: Word Counts
Word Counts to find out:
- What are the most common words in the text.
- How many words are in the text (word tokens and word types).
- What the average frequency of each word in the text is.
Limitation of word counts: Most words appear very infrequently and it is hard to predict much about the behavior of words that do not occur often in a corpus. ==> Zipf’s Law.