Hufmann Encoding: Huffman encoding is an example of entropy encoding. It is based on
statistical methods. Given the character that must be encoded, together with the probability of their occurrences, the Huffman encoding algorithm determines the optimal code using the minimum
number of bits. Hence the length (number of bits) of the coded characters will differ. In text, the shortest code is assigned to those characters that occur most frequently. To determine a
Huffman code, it is useful to construct a binary tree. The nodes of this tree represent the characters that are to be encoded. Every node contains the occurrece probability of one of the
characters belonging to this subtree. 0 and 1 are assigned to the edges of the tree. The two characters with the lowest probabilities are combined in the first binary tree. Their root node is
labeled with these characters and the combined probability. The edges are labeled with 1 and 0 resp. (This assignment is arbitrary, therefore, with the same data one can get different Huffman
codes). The nodes below this root node wont be considered anymore. Again, the two nodes with the lowest probabilities are combined into a binary subtree, the root node and the edges are
labeled as before. Continue in this way until the bode with the whole used alphabet and the probability 1 is reached. The encoded data for the characters are the pathes through the tree to
theit nodes. The result is stored in a table. If the information of an image can be transformed into a bit stream, such a table can be used to compress the data without loss.
Click on the button below to reach the applet:
Example:
Hufmann Encoding: In the
figure, characters A,B,C,D and E have the following probability of occurence: p(A) = 0.16, p(B) = 0.51, p(C)=0.09, p(D)=0.13, p(E)=0.11
The edge from node CE to node C is assigned a
1 and the edge from CE to E becomes a 0. The following nodes remain after the first step: p(A) = 0.16, p(B) = 0.51, p(CE)=0.20, p(D)=0.13
Tbe edge from AD to A is assigned a 1 and the edge from AD to D a 0. The following nodes remain after the second step: p(AD) = 0.29, p(B) = 0.51, p(CE)=0.20
Tbe edge from ADCE to AD is assigned a 0 and the edge from ADCE to CE a 1. The following nodes remain after the third step: p(ADCE) = 0.49, p(B) = 0.51
Tbe edge from ADCEB to B
is assigned a 1 and the edge from ADCEB to CEAD a 0. The figure shows the resulting Huffman code in the form of a binary tree. The result is the following code that is stored in a table:
w(A)=001, w(B)=1, w(C)=011, w(D)=000, w(E)=010