I'm aware of some techniques to compress a string of characters such as Huffman coding. I need to know whether there is any technique to compress a long string of arbitrary bits? (ideally with high compression ratio).
Asked By : user13676
Answered By : Yuval Filmus
Suppose we could always compress a string of length $n$ bits to a string of length $n/2$ bits. Apply this compression algorithm $\log_2 n + 1$ times, and we have reduced a string of length $n$ bits into the empty string. But does the empty string really contain within itself all the information needed to reconstruct the original $n$ bit string?
More generally, a simple counting argument shows that arbitrary strings cannot be compressed. For any compression algorithm $C$:
- For every $n$ we can find a string $x$ of length $n$ such that $|C(x)| \geq n$.
- For every $n$, at least half the strings of length $n$ satisfy $|C(x)| \geq n-1$.
- For every $n$, at least $1-1/n$ of the strings of length $n$ satisfy $|C(x)| \geq n-\log n$.
In particular, an "arbitrary" string cannot be compressed appreciably.
What are compression algorithms good for, then?
Many compression algorithms are lossy, that is, they actually throw away some of the information. This is the case with JPEG, MP3 and MPEG, for example. The compressed picture doesn't look quite the same, but it does look very similar. This property of pictures, that a picture can look similar while differing at the bit level, is exploited by lossy compression algorithms. These algorithms isolate the important parts of the data which are enough to cause in the viewer the same sensation as the original.
Some compression algorithms are lossless but are still able to compress their input. This is because the inputs are often not random and have structure which can be exploited. For example, HTML documents are extremely non-random: for one, they don't use the entire range of characters, but only alphanumeric characters and punctuation. Even raw sound files and pictures can be compressed appreciably, since music and images one would like to compress are not arbitrary white noise, but are smooth in various ways.
After you have compressed a file once, you don't expect it to be compressible again. The aim of a good compression algorithm is to achieve a random looking output, which in particular usually cannot be compressed, as we've seen above.
There are many lossy and lossless compression algorithms out there. If you're looking for lossless compression, you can try some of them out. Many are implemented as standalone tools such as ZIP and BZIP2. Lossy compression strongly depends on the type of medium you are trying to compress. Again, standalone tools are usually available for your perusal.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/41377
0 comments:
Post a Comment
Let us know your responses and feedback