World's most popular travel blog for travel bloggers.

# [Solved]: Lossless data compression must make some messages longer?

, ,
Problem Detail:

I read on Wikipedia and in lecture notes that if a lossless data compression algorithm makes a message shorter, it must make another message longer.

E.g. In this set of notes, it says:

Consider, for example, the 8 possible 3 bit messages. If one is compressed to two bits, it is not hard to convince yourself that two messages will have to expand to 4 bits, giving an average of 3 1/8 bits.

There must be a gap in my understand because I thought I could compress all 3 bit messages this way:

• Encode: If it starts with a zero, delete the leading zero.
• Decode: If message is 3 bit, do nothing. If message is 2 bit, add a leading zero.
• Compressed set: 00,01,10,11,100,101,110,111

What am I getting wrong? I am new to CS, so maybe there are some rules/conventions that I missed?

You are missing an important nuance. How would you know if the message is only 2 bits, or if it's part of a bigger message? For that, you must also encode a bit that says that the message starts, and a bit that says it ends. This bit should be a new symbol, because 1 and 0 are already used. If you introduce such a symbol and then re-encode everything to binary, you will end up with an even longer code.