1,4Department of Computer Science & Engineering, International Islamic University Chittagong, Bangladesh
Keywords: Tree Compression; Static Huffman Code; Data Structures; Complete Binary Tree;
Compression is accomplished by using algorithms or formulas which describes the shrinking procedure of the size of data [10, 11]. The prime performance metric of data compression is compression ratio [12]. Data compression can be lossless or loss. Lossless compression occupies the restitution of a file to its exact status, without having loss of a single bit of data when the file is uncompressed. For example- text compression, spreadsheet files etc. are lossless technique as because loss of a single character, numbers, or words may convert a message into completely different meaning.
The key objective of this research is to invent a technique to compress the tree size of Huffman compression by a memory efficient representation of Huffman tree [8, 9]. Significant improvements of the tree compression ratio will enhance the data compression ratio. Then designing encoding and decoding algorithm for new technique so that encoding and decoding is accomplished with minimal effort.
In this paper, we’ve proposed a new encoding-decoding technique to compress the Huffman tree size in an efficient manner and compared the performances (with respect to compression ratio, savings bits etc.) with the previous works. We also have investigated the limitations of some previous works. The proposed method is also a bit representation technique.
The organization of residual of the paper is as follows. Literature Review is explained in section 4. Methodology is described in section 5. Comparison and Results are discussed in the section 6-7. Limitations and scope of future work are depicted in the next section. Finally, we conclude the paper in section 9.
In reference [2], the authors Choudhury and Kaykobad proposed a new data structure for Huffman tree representation in which in addition to sending symbol codes, codeword for all the circular leaf nodes are sent. They decoded the text by using the memory efficient data structure proposed by Chen et al.
In reference [5], the authors Bei Chen, Hong-Cai Zhang, Wen- Lun Cao, Jian-Hu Feng introduced a new Huffman coding method based on number characters. The traditional 256 code table is replaced by the 0-9 characters, the space character and the enter character in this method. In reference [4] the authors, Wang, S.S-W.Wu, J.J.-L. Chuang, S.S.-C. Hsiao, C.C.-C. Tung, Y.Y.-S. proposed a Lagrangian multiplier based penalty resource metric to be the targeting cost function and addresses the optimization problem of minimizing the number of memory access subject to a rate constraint for any Huffman decoding.
To lessen the memory size and fix the process of searching a symbol in a Huffman tree, Pi Chung Wang et al. proposed a memory efficient data structure to represent the Huffman tree utilizing the property of the encoded symbols, which uses memory nd bits, where n is the number of source symbols and d is the depth of the Huffman tree. Using single-side growing Huffman tree, based on the proposed data structure, an O (log n)-time Huffman decoding algorithm is introduced [3, 6, 7].
In our proposed method the whole Huffman tree is traversed in a depth first fashion. When a leaf containing a character is to be saved, we save 7 bits of that character omitting the MSB. When returning back to the parent from a child node the relative position of the child is saved this takes one bit for each edge of the tree.
b. Rule 2: when returning back from a child node to its parent whether it’s a leaf or an internal node a single bit containing either 0 if the child is the left child of the parent or 1 if the child is the right child of the parent is saved.
2. Repeat steps i and ii until the traversal is not finished
i. If a leaf is encountered save the character in that position according to rule 1.
ii. When returning back from a node to its parent node save its relative position according to rule 2.
Output: Huffman tree
Decoding starts from the left of the bit stream.
Initialization of Current_subtree, subtree stack
Read_character () reads consecutive 7 bits and converts these
into an 8 bit ASCII code adding 0 to its MSB.
Read_position () reads a single bit.
Decode_and_build_tree ()
1. Read_character()
2. Current_subtree:= character read
3. while scanned bit6=NULLdo
4. if Current_subtree 6=NULLthen
5. Read_position()
6. if position = 0 then
7. Current_subtree is a left child. Push it into the stack.
8. Current_subtree := NULL
9. else
10. Pop topmost subtree from the stack
11. Create a new node with left child := popped out subtree
12. right child:=Current_subtree
13. Current_subtree := newly created node
14. end if
15. else
16. Read_character()
17. Current_subtree := character read
18. end if
19. end while
20. Current_subtree is the required Huffman tree.
Here we depict an example of implementation for clear understanding.
This tree (Figure 1, 2) will be encoded into the following bit sequence. Here every character consists of 7 bits. This representation has been used for easy understanding.
Besides, every ASCII coded character comprises of 8 bits where the MSB is 0 for each character. We are ignoring this bit while encoding so that we will require 7 bits for representing each character in our generated encoded bit sequence (explained in Figure 3, 4).
**In reference [1], the best case will be occurred when only the right most branch of the tree is expanded which is not a very frequent occurrence. In other cases redundant bits for saving the external limb and each upper leaf node will be required.
1. file1.txt="Huffman Coding Huffman Coding"
2. file2.txt="A new approach of memory efficient Huffman tree
Methods |
Memory Requirement |
Spaced savings of proposed method compared with existing method |
Original Huffman Coding |
2n bytes |
(2n*8)-(9n-2) = 7n+2 bits |
CM1 |
2n-3 bytes |
(2n-3)*8-(9n-2) =7n-22 bits |
Method of Pi-Chung Wang; Yuan-Rung Yang; Chun-Liang Lee; Hung-Yi Chang |
nd bits |
n(n-1)-(9n-2) = =n2 -10n+2 bits (approximated) |
Choudhury and Kaykobad Method |
3n/2 bytes at worst case* |
(3n/2)*8-(9n-2) = 3n-2bits (approximated) |
Zinnia et al. Method |
10.75n-3 bits at worst case** |
(10.75n-3)- (9n-2) =1.75n-1 bits (approximated) |
Proposed Method |
9n-2 bits at worst case |
|
4. file4.txt ="I’ve implemented my proposed algorithm using programming language C because I like it most among all programming languages"
5. file5.txt="Best case complexity occurs when only 1 circular leaf node is considered or only one external limb is considered and all symbols except right brother leaf are the left leafs of the external limb"
The following are the comparison with different methods for above files (Table 2,3)
File Name |
Number of Symbols |
Number of bits needed to save the Huffman tree |
|||||
Original Huffman |
Proposed Method |
Zinnia et al. |
CM1 method |
Pi Chung Wang et al. method |
Choudhury and Kaykobad method |
||
file1.txt |
13 |
208 |
115 |
134 |
184 |
156 |
152 |
file2.txt |
20 |
320 |
178 |
207 |
296 |
280 |
224 |
file3.txt |
25 |
400 |
223 |
264 |
376 |
375 |
272 |
file4.txt |
25 |
400 |
223 |
248 |
376 |
375 |
280 |
file5.txt |
25 |
400 |
223 |
269 |
376 |
400 |
264 |
File name |
Number of bits required ( negative value denotes increment of original file size) |
||||||||||||
Original file Size |
Original Huffman |
Space Savings |
Proposed method |
Space savings |
Zinnia et al. method |
Space savings |
Choudhury &Kaykobad method |
Space savings |
Pi Chung Wang et al. method |
Space savings |
CM1 method |
Space savings |
|
file1.txt |
248 |
321 |
-73 |
228 |
20 |
247 |
1 |
265 |
-17 |
269 |
-21 |
297 |
-49 |
Compression |
|
0% |
|
28.97 |
|
23.05 |
|
17.45 |
|
16.20 |
|
7.48 |
|
- Sultana Z and Akter S. A New Approach of a Memory Efficient Huffman Tree Representation Technique. IEEE International conference(ICIEV). 2012. DOI: 10.1109/ICIEV.2012.6317482
- Rezaul C, Kaykobad M, Irwin K. An efficient Decoding Technique for Huffman Codes. Inform Process Letter. 2002;81(6):305-308.
- Pi-Chung W, Yuan-Rung Y, Chun-Liang L, Hung-Yi C. A memory efficient Huffman decoding algorithm. Advanced Information Networking and Applications. 2005;2:475-479. DOI: 10.1109/AINA.2005.33
- Sung-Wen W, Shang-Chih C, Chih-Chieh H, Yi-Shin T, Ja-ling W. An Efficient Memory Construction Scheme for an Arbitrary Side Growing Huffman Table. IEEE International Conference on Multimedia and Expo. 2006:141-144. DOI: 10.1109/ICME.2006.262589
- Bei C, Hong-Cai Z, Wen-Lun C, Jian-Hu F. Huffman Coding Method Based on Number Character. International Conference on Machine Learning and Cybernetics, 2007;4:2296-2299. DOI: 10.1109/ICMLC.2007.4370528
- Sung-Wen W, Ja-Ling W, Shang-Chih C, Chih-Chieh H, Yi-Shin T. Memory Efficient Hierarchical Lookup Tables for Mass Arbitrary –Side Growing Huffman Trees Decoding. IEEE Transactions on Circuits and Systems for Video Technology. 2008:18(10):1335-1346. DOI: 10.1109/TCSVT.2008.920968
- Seymour L. Theory and problems of Data Structures. McgrawHill Book Company. 2002:249-255.
- Hashemian R. Memory efficient and high speed search Huffman coding. IEEE Trans Comm. 1995;43(10):2576-2581. DOI: 10.1109/26.469442
- Yuh-he C, Ja-ling W. An SGH-Tree Based Huffman Decoding. Fourth International Conference on Information, Communications and Signal Processing. 2003;3:1483-1487. DOI: 10.1109/ICICS.2003.1292713
- Huffman DA. A method for construction of minimum redundancy codes. Proceedings of the IRE. 1952;40(9):1098-1101.
- Schack R. The length of a typical Huffman codeword. IEEE Transactions on Information Theory. 1994;40(4):1246-1247.C. DOI: 10.1109/18.335944
- Hong-Chung C, Yue-Li W, Yu-Feng L. A memory- efficient and fast Huffman decoding algorithm. Information Processing Letter. 1999;69(3):119-122.