An Improved Decoding Technique for Efficient Huffman Coding

In today’s world storing numerous data or information efficiently is a significant issue because of limited storage. Every moment we need to transfer large volume of information soundly and correctly. A storage space/memory storage required to store this information is. The way of communication which is also constricted with confined communication lines. These incompetence drives to need of data compression. Compression makes a reduction in memory storage, data transmission time, communication bandwidth and the ultimate cost savings also. To compress data efficiently and effectively one of the most popular and widely used techniques is Huffman compression. It is a lossless compression technique that enables the restoration of a file to its authentic/key state, having not to loss of a single bit of data when the file is uncompressed. It will be more efficient by reducing the memory requirements for Huffman tree. This article aimed at reducing the tree size of Huffman coding and also explored a newly memory efficient technique to store Huffman tree. As a consequence we also designed an encoding and decoding algorithm. Our proposed technique is much more efficient than most of the/all other existing techniques where to represent the Huffman tree structure total memory requirements are 9n-2 bits for the worst case, average and also for the best case. The results obtained significant improvements over previous/existing work. 2


Abstract
In today's world storing numerous data or information efficiently is a significant issue because of limited storage. Every moment we need to transfer large volume of information soundly and correctly. A storage space/memory storage required to store this information is. The way of communication which is also constricted with confined communication lines. These incompetence drives to need of data compression. Compression makes a reduction in memory storage, data transmission time, communication bandwidth and the ultimate cost savings also. To compress data efficiently and effectively one of the most popular and widely used techniques is Huffman compression. It is a lossless compression technique that enables the restoration of a file to its authentic/key state, having not to loss of a single bit of data when the file is uncompressed. It will be more efficient by reducing the memory requirements for Huffman tree. This article aimed at reducing the tree size of Huffman coding and also explored a newly memory efficient technique to store Huffman tree. As a consequence we also designed an encoding and decoding algorithm. Our proposed technique is much more efficient than most of the/all other existing techniques where to represent the Huffman tree structure total memory requirements are 9n-2 bits for the worst case, average and also for the best case. The results obtained significant improvements over previous/existing work.

Introduction
Human world is now undergoing the most intense technological revolution. We are still living in the era of information. Every instance of time we have to maintain a large volume of data or information and store them as well. But still storage space is considered to be limited for solving our purpose. To work out in storing data or information in an efficient manner is an important factor/issue. For this reason, a technique, namely compression, is used to act perfectly using the limited storage space.
Compression is accomplished by using algorithms or formulas which describes the shrinking procedure of the size of data [10,11]. The prime performance metric of data compression is compression ratio [12]. Data compression can be lossless or loss. Lossless compression occupies the restitution of a file to its exact status, without having loss of a single bit of data when the file is uncompressed. For example-text compression, spreadsheet files etc. are lossless technique as because loss of a single character, numbers, or words may convert a message into completely different meaning.
The key objective of this research is to invent a technique to compress the tree size of Huffman compression by a memory efficient representation of Huffman tree [8,9]. Significant improvements of the tree compression ratio will enhance the data compression ratio. Then designing encoding and decoding algorithm for new technique so that encoding and decoding is accomplished with minimal effort.
In this paper, we've proposed a new encoding-decoding technique to compress the Huffman tree size in an efficient manner and compared the performances (with respect to compression ratio, savings bits etc.) with the previous works. We also have investigated the limitations of some previous works. The proposed method is also a bit representation technique.
The organization of residual of the paper is as follows. Literature Review is explained in section 4. Methodology is described in section 5. Comparison and Results are discussed in the section 6-7. Limitations and scope of future work are depicted in the next section. Finally, we conclude the paper in section 9.

Literature Review
Includingoriginal Huffman coding most of the previous works are based on byte representation of Huffman tree. Butinreference [1], the author zinnia et al. proposed the bit representation of Huffman tree, where the authors use some terminology namely the conception of "circular leaf node", "node with 2 internal son nodes" and give new concepts of "upper leaf node", "external limb", "internal limb", "antenna limb" [1] to solve critical limitations of existing works and they were promising to solve some existing problems.
In reference [2], the authors Choudhury and Kaykobad proposed a new data structure for Huffman tree representation in which in addition to sending symbol codes, codeword for all the circular leaf nodes are sent. They decoded the text by using the memory efficient data structure proposed by Chen et al.
In reference [5], To lessen the memory size and fix the process of searching a symbol in a Huffman tree, Pi Chung Wang et al. proposed a memory efficient data structure to represent the Huffman tree utilizing the property of the encoded symbols, which uses memory nd bits, where n is the number of source symbols and d is the depth of the Huffman tree. Using single-side growing Huffman tree, based on the proposed data structure, an O (log n)-time Huffman decoding algorithm is introduced [3, 6, 7].

Proposed Method
Lots of applications of Huffman encoding method rely on ASCII codes. ASCII is 8-bit character coding scheme. But it is defined only for 0-127 codes which means it can be fit into 7-bits. The leftmost bit of an ASCII coded character is always set to zero i.e. the MSB is guaranteed to be 0. We've taken advantage of that bit.
In our proposed method the whole Huffman tree is traversed in a depth first fashion. When a leaf containing a character is to be saved, we save 7 bits of that character omitting the MSB. When returning back to the parent from a child node the relative position of the child is saved this takes one bit for each edge of the tree.

Rules for encoding
a. Rule1: When saving a character the MSB of 8 bits is omitted hence 7 rightmost bits are saved.
b. Rule 2: when returning back from a child node to its parent whether it's a leaf or an internal node a single bit containing either 0 if the child is the left child of the parent or 1 if the child is the right child of the parent is saved.

Proposed encoding algorithm
1. Traverse the tree in a depth first fashion. 2. Repeat steps i and ii until the traversal is not finished i. If a leaf is encountered save the character in that position according to rule 1.
ii. When returning back from a node to its parent node save its relative position according to rule 2.

Proposed decoding algorithm
Input: bit stream Output: Huffman tree Decoding starts from the left of the bit stream. Initialization of Current_subtree, subtree stack Read_character () reads consecutive 7 bits and converts these into an 8 bit ASCII code adding 0 to its MSB.
Read_position () reads a single bit. This tree (Figure 1, 2) will be encoded into the following bit sequence. Here every character consists of 7 bits. This representation has been used for easy understanding.

Complexity analysis
If there are n number of different characters to be encoded, total number of edges in Huffman tree would be (2*n -2). In our proposed method every edge is denoted by either 0 or 1 depending Besides, every ASCII coded character comprises of 8 bits where the MSB is 0 for each character. We are ignoring this bit while encoding so that we will require 7 bits for representing each character in our generated encoded bit sequence (explained in Figure  3, 4).

Figure 3: Line Diagram Showing Required Number of Bits for Different Methods
So total space needed would be: (7*n + 2*n -2) = 9*n -2 bits *Authors at reference [2], didn't mention the limitation but this analysis won't work if the tree depth exceeds 7th level (starting from level 0). Then a massive change in algorithm would be needed in that case. **In reference [1], the best case will be occurred when only the right most branch of the tree is expanded which is not a very frequent occurrence. In other cases redundant bits for saving the external limb and each upper leaf node will be required.

Experimental Results
We have encoded the following files with our proposed method ( Table 1): 1. file1.txt="Huffman Coding Huffman Coding" 2. file2.txt="A new approach of memory efficient Huffman tree  4. file4.txt ="I've implemented my proposed algorithm using programming language C because I like it most among all programming languages" 5. file5.txt="Best case complexity occurs when only 1 circular leaf node is considered or only one external limb is considered and all symbols except right brother leaf are the left leafs of the external limb" The following are the comparison with different methods for above files ( Table 2,3) • Original file size is 31 bytes=248 bits • Total no. of bits required only for compressed text is 113 bits.
• Total no. of bits required only for tree is 2*13 bytes=26*8 bits=208 bits in original Huffman coding.
• Total size of compressed file along with tree in original Huff-  • Total size of compressed file along with tree in Proposed method is (115+113) bits=228 bits.

Limitations and Scope of Future Work
As we have compressed each character in 7 bits by omitting the MSB of 8bit scheme, our algorithm will fit for only working with ASCII characters and will not work properly for extended ASCII character set which uses the MSB also to encode special characters. But efficient mapping might overcome this situation as we don't use all these 256 combinations at once frequently. Another future plan is to get rid of some redundant bits that are used to join the right sub-trees with their left counterparts. Nearly N bits can be saved in best case by efficiently handling this.

Conclusion
This paper concentrate on diminishing space and action of the operation because of confined communication line through which data is to be sent. The key purpose of this paper is to reduce the space needed to represent the Huffman tree as it is necessarily has to be sent along with the encoded text to decode successfully. The method proposed here is unique from previously proposed methods which ensure an easy and fast implementation. It outperformed over all the existing methods both in space complexity analysis and experimental results as it takes only 9N -2 bits to represent the Huffman tree.