World's most popular travel blog for travel bloggers.

"Archiving" byte sequence into human-readable set of chars

, , No Comments
Problem Detail: 

Ok, lets assume we have sequence of 1000 bytes. So the possible number of value variations is 2^100.

Is there a way to "index" each variation with letters and decimal numbers (A-Z, 0-9), having as shortest length as possible?

So the person can dictate sequence of letters and numbers by phone and it can be recreated into bytes on the other side.

Im really bad at math, and found how to shorten that string up to about 100 symbols. Any way to di it shorter?

Asked By : Drinkins
Answered By : wvxvw

A common encoding used for such tasks is Base64. It is easy to implement and it has been implemented multiple times in many main-stream languages. However, it is not exactly suited for being read by humans.

The basic underlying idea is that a character of Latin alphabet is encoded in ASCII using 6 bits. The encoding alphabet is completed by few punctuation symbols to make it the exact multiple of 8.

Here is a nice even if somewhat light-hearted proposal for alternative Base32 encoding (it also tries to prevent occasional appearance of expletives in the produced encoding, at least if it is interpreted as English language).

Unfortunately, you won't be able to make it much shorter using only Latin alphabet, unless you have more information about the nature of the data being encoded. Text compression algorithms would try to discover such information about the text being encoded and use variable length encoding, assigning shorter sequences in the output to the more common sequences in the source text. For a simple example of such encoding you can look into Huffman coding

Another, obvious way to shorten the output length is to use a larger alphabet.

Best Answer from StackOverflow

Question Source :

3200 people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback