Huffman Program and the "&" character ===================================== I think I should call up your attention to the extra character "&" which is being added by the encoder. Encoding ======== Suppose the original message is: "AAB" and the codeword for "A" is "001" and "B" is "1001" (the numbers here are hypothetical) The encoded message would consist of the following bits: "0010011001". When packing bits into bytes, we would have incomplete bytes if the total number is not a multile of 8. In the example: First byte would correspond to: "00100110" but the second one should contain: "01" If the byte if filled with 0's how could we differentiate added 0's from actual codes for other letters when decoding? This problem was solved in the program provided in the following way: an extra character is used in the encoding "&" (suppose that "&" gets codeword "111") Now, when encoding "AAB" we add "&" to the end and have encoded "AAB&". In the example above, the encoded message would become: "0010011001111" First byte: "00100110" Second byte: "01111" we then complete the byte with zeroes - "01111000" Decoding ======== By traversing the tree, I should be able to decode in order A, A, B, & (when "&" is found, I know I should stop getting bits, since they are simply padded 0's). My decoder knows "&" is a mark showing the end of message, so my output will be simply "AAB" ---------------------------------------------------------------------- I hope the above details didn't confuse you too much. The main message is that "&" is not an encoded character, but it was placed by the encoder to indicate end of message. This "&" should not be part of the output, neither whatever you may get by considering the extra 0's added. Each "byte" is serving as a buffer for reading bits. Lucia Moura