作者:
SR Kosaraju,G Manzini
关键词:
data compression ; entropy codes ; runlength codes ; LZ77 ; LZ78 ; Lempel-Ziv algorithms ; Lempel-Ziv methods ; PPM algorithms ; compression ratio ; entropic bounds
摘要:
Summary form only given, as follows. We initiate a study of parsing-based compression algorithms such as LZ77 and LZ78 by considering the empirical entropy of the input string. For any string s, we define the k-th order entropy H(s) by looking at the number of occurrences of each symbol following each k-length substring inside s. The value H(s) is a lower bound to the compression ratio of a statistical modeling algorithm which predicts the probability of the next symbol by looking at the k most recently seen characters. Therefore, our analysis provides a means for comparing Lempel-Ziv methods with the more powerful, but slower, PPM algorithms. Our main contribution is a comparison of the compression ratio of Lempel-Ziv algorithms with the zeroth order entropy H. First we show that for low entropy strings LZ78 compression ratio can be much higher than H. Then, we present a modified algorithm which combines LZ78 with run length encoding and is able to compress efficiently also low entropy strings
在线下载