Hufmann Encoding

questions:E-mail to
abed@kom.tu-darmstadt.de

JPEG (Joint Photographic Experts Group)

Beginner -

Hufmann Encoding:
Huffman encoding is an example of entropy encoding. It is based on statistical methods. Given the character that must be encoded, together with the probability of their occurrences, the Huffman encoding algorithm determines the optimal code using the minimum number of bits. Hence the length (number of bits) of the coded characters will differ. In text, the shortest code is assigned to those characters that occur most frequently. To determine a Huffman code, it is useful to construct a binary tree. The nodes of this tree represent the characters that are to be encoded. Every node contains the occurrece probability of one of the characters belonging to this subtree. 0 and 1 are assigned to the edges of the tree. The two characters with the lowest probabilities are combined in the first binary tree. Their root node is labeled with these characters and the combined probability. The edges are labeled with 1 and 0 resp. (This assignment is arbitrary, therefore, with the same data one can get different Huffman codes). The nodes below this root node wont be considered anymore. Again, the two nodes with the lowest probabilities are combined into a binary subtree, the root node and the edges are labeled as before. Continue in this way until the bode with the whole used alphabet and the probability 1 is reached. The encoded data for the characters are the pathes through the tree to theit nodes. The result is stored in a table. If the information of an image can be transformed into a bit stream, such a table can be used to compress the data without loss.

Click on the button below to reach the applet:

Example: Hufmann Encoding:
In the figure, characters A,B,C,D and E have the following probability of occurence:
p(A) = 0.16, p(B) = 0.51, p(C)=0.09, p(D)=0.13, p(E)=0.11

The edge from node CE to node C is assigned a 1 and the edge from CE to E becomes a 0.
The following nodes remain after the first step:
p(A) = 0.16, p(B) = 0.51, p(CE)=0.20, p(D)=0.13

Tbe edge from AD to A is assigned a 1 and the edge from AD to D a 0.
The following nodes remain after the second step:
p(AD) = 0.29, p(B) = 0.51, p(CE)=0.20

Tbe edge from ADCE to AD is assigned a 0 and the edge from ADCE to CE a 1.
The following nodes remain after the third step:
p(ADCE) = 0.49, p(B) = 0.51

Tbe edge from ADCEB to B is assigned a 1 and the edge from ADCEB to CEAD a 0.
The figure shows the resulting Huffman code in the form of a binary tree. The result is the following code that is stored in a table:
w(A)=001, w(B)=1, w(C)=011, w(D)=000, w(E)=010