The Relation to Language: Cross-Entropy
Entropy can be thought of as a matter of how surprised we will be to see the next word given previous words we already saw.
The cross entropy between a random variable X with true probability distribution p(x) and another pmf q (normally a model of p) is given by: H(X,q)=H(X)+D(p||q).
Cross-entropy can help us find out what our average surprise for the next word is.