The Entropy of English

These models assume limited memory, i.e., we assume that the next word depends only on the previous k ones [kth order Markov approximation].

We can model English using n-gram models (also known a Markov chains).