Michael Drmota and Wojciech Szpankowski (2017), "Redundancy of Lossless Data Compression for Known Sources by Analytic Methods", Foundations and TrendsĀ® in Communications and Information Theory: Vol. 13: No. 4, pp 277-417. http://dx.doi.org/10.1561/0100000090

© 2017 M. Drmota and W. Szpankowski

Coding and compression, Computational aspects of combinatorics and graph theory, Design and analysis of algorithms

Download article
**In this article:**

1. Introduction

2. Preliminary Results

3. Redundancy of Shannon and Huffman FV Codes

4. Redundancy of Tunstall and Khodak VF Codes

5. Redundancy of Divide-and-Conquer VF Arithmetic Coding

6. Redundancy of VV Khodak Codes

7. Redundancy of Non Prefix One-to-One Codes

8. Concluding Remarks

Acknowledgements

References

Lossless data compression is a facet of source coding and a well studied problem of information theory. Its goal is to find a shortest possible code that can be unambiguously recovered. Here, we focus on rigorous analysis of code redundancy for known sources. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We present precise analyses of three types of lossless data compression schemes, namely fixed-to-variable (FV) length codes, variable-to-fixed (VF) length codes, and variableto- variable (VV) length codes. In particular, we investigate the average redundancy of Shannon, Huffman, Tunstall, Khodak and Boncelet codes. These codes have succinct representations as trees, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf). Such trees are precisely analyzed by analytic methods, known also as analytic combinatorics, in which complex analysis plays decisive role. These tools include generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, Tauberian theorems, and singularity analysis. The term analytic information the- ory has been coined to describe problems of information theory studied by analytic tools. This approach lies on the crossroad of information theory, analysis of algorithms, and combinatorics.

1. Introduction

2. Preliminary Results

3. Redundancy of Shannon and Huffman FV Codes

4. Redundancy of Tunstall and Khodak VF Codes

5. Redundancy of Divide-and-Conquer VF Arithmetic Coding

6. Redundancy of VV Khodak Codes

7. Redundancy of Non Prefix One-to-One Codes

8. Concluding Remarks

Acknowledgements

References

The term *analytic information theory* has been coined to describe problems of information theory studied by analytic tools. The approach of applying tools from analysis of algorithms to problems of source coding and, in general, to information theory lies at the crossroad of computer science and information theory. Combining the tools from both areas often provides powerful results, such as computer scientist Abraham Lempel and information theorist Jacob Ziv working together in the late 1970s to develop compression algorithms that are now widely referred to as Lempel-Ziv algorithms and are the basis of the ZIP compression still used extensively in computing
today.

This monograph surveys the use of these techniques for the rigorous analysis of code redundancy for known sources in lossless data compression. A separate chapter is devoted to precise analyses of each of three types of lossless data compression schemes, namely fixed-to-variable length codes, variable-to-fixed length codes, and variable-to-variable length codes. Each one of these schemes is described in detail, building upon work done in the latter part of the 20th century to present new and powerful techniques. For the first time, this survey presents redundancy for universal variable-to-fixed and variable-to-variable length codes in a comprehensive and coherent manner.

The monograph will be of interest to computer scientists and information theorists working on modern coding techniques. Written by two leading experts, it provides the reader with a unique, succinct starting point for their own research into the area.