World's most popular travel blog for travel bloggers.

[Solved]: Compression functions are only practical because "The bit strings which occur in practice are far from random"?

, , No Comments
Problem Detail: 

I would have made a comment, as this pertains to Andrej Bauer's answer in this thread; however, I believe it is worth a question.

Andrej explains that given the set of all bit strings of length 3 or less, a lossless compression function can only "compress" some of them. The others, for instance "01" would have to actually be compressed to a string such as "0001", with length 4. The compression ratio is simply the average compression across the input set.

This makes lossless compression seem impractical, but the important quote is this:

The bit strings which occur in practice are far from random and exhibit a lot of regularity.

I have a hard time believing that, for instance, multimedia files are represented by anything other than random bit strings. Is there truly a pattern that compression functions leverage to make the algorithm useful in reality?

Asked By : AlexMayle

Answered By : john_leo

First of all, you're right: Multimedia files are represented (more or less) as random files. The reason for that is that those files are already compressed (lossy). Note that mp3, for example, is nothing but a compression algorithm!
A consequence is that further compression will not yield any noticeable compression (and indeed, lossless compression on already compressed (multimedia) files has never been a road to success).

You're also right on your other point: Lossless compression cannot compress on the average. To see that, say, your possible data set consists of $2^n$ different elements. How many bits do you need per file to always be able to distinguish elements from your set? Right, $n$. In total all files will be represented by no less than $n \cdot 2^n$ bits. Now if you represent some of those files by less than $n$ bits, some files will be represented by more than $n$ bits. That's about all there is to say.

In short, lossless compression works because text files are far from random (just consider the letter distribution of my answer and compare the number of e's with the number of z's!), and compressing random data (e.g. already compressed data or encrypted data) does not make any sense.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/40684

0 comments:

Post a Comment

Let us know your responses and feedback