I read on Wikipedia and in lecture notes that if a lossless data compression algorithm makes a message shorter, it must make another message longer.
E.g. In this set of notes, it says:
Consider, for example, the 8 possible 3 bit messages. If one is compressed to two bits, it is not hard to convince yourself that two messages will have to expand to 4 bits, giving an average of 3 1/8 bits.
There must be a gap in my understand because I thought I could compress all 3 bit messages this way:
- Encode: If it starts with a zero, delete the leading zero.
- Decode: If message is 3 bit, do nothing. If message is 2 bit, add a leading zero.
- Compressed set: 00,01,10,11,100,101,110,111
What am I getting wrong? I am new to CS, so maybe there are some rules/conventions that I missed?
Asked By : Legendre
Answered By : Shaull
You are missing an important nuance. How would you know if the message is only 2 bits, or if it's part of a bigger message? For that, you must also encode a bit that says that the message starts, and a bit that says it ends. This bit should be a new symbol, because 1 and 0 are already used. If you introduce such a symbol and then re-encode everything to binary, you will end up with an even longer code.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/9938
0 comments:
Post a Comment
Let us know your responses and feedback