World's most popular travel blog for travel bloggers.

[Solved]: How much bits need to add to 100bit of data in order to correct up to 10bits?

, , No Comments
Problem Detail: 

I'm trying to calculate how much minimum bits need to be added to data of 100bits, in order to correct 10 bits that are messed up by:

  1. bits that deleted (Erasure Correcting)
  2. bits that corrupted (Error Correcting)

I tried Hamming code and Reed-Solomon codes but didn't figured out the answer of 1 or 2.

There are some other more-efficient algorithm for that problem ? Thanks in advance !

Asked By : CS-Student

Answered By : Ran G.

This seemingly innocent question, is actually quite difficult to answer. Let me try to give some more information on the parameters that get into the game here, and give a partial survey of common results and bounds.

As mentioned in Qalnut's answer, to deal with $10$ bits of errors (in the worst case) we must have distance at least $d=2\cdot 10+1$. This is given. Another parameter that is implied by your question is that the basic unit is a bit. You ask, what is the minimal length $n$ of a (binary) code that has distance $d$ and contains, at least $2^{100}$ codewords.

The lower Bound

Let's first try to ask a slightly more general question:

Given an alphabet of size $q$, what is the maximal number of codewords in code of length $n$ that has minimal distance $d$. [Let's denote this number by $A_q(n,d)$].

The answer to this question is not simple at all, but there are several well known bounds. One very useful bound is the singleton bound, which states that $$ A_q(n,d) \le q^{n-d+1}$$

This immediately tells us that in order to have $2^{100}$ codewords with distance $d=21$, you must take $n>120$, that is, to add at least 20 bits. But this is only a bound, it never says that such a code (with length 21) actually exists, or how to find it.

More generally, this bound tells us that the relation between the rate of the code $R=n/k$ and the relative distance of the code $\delta=d/n$ behaves, for $n\to \infty$, as $$ R \le 1-\delta + o(1)$$

Since you wished the relative distance of your code to be $d/n =0.175$, the maximal rate is $\approx 0.825$, which is exactly what we got ($k/n = 100/120 = 0.833 = 0.825 + 1/n)$. The moral of this story is: if you aim for distance $\delta$ (which can correct $\approx \delta n /2$ errors), you will have to add at least $$(n-k) = n-Rn \approx \delta n $$ more symbols. This is a nice rule of thumb. In the asymptotic case (that is, when $n$ is very very large) that would be approximately the answer.

As said, this bound is, in many cases, not really tight. For other lower bounds, look for the Plotkin bound (for binary codes) and the Johnson bound.

The upper bound

So we know we can't get $n$ smaller than, say, 120. But this is not very helpful from a practical point of view, since we don't even know if a code with $n=120$ exists, or maybe in this case the first code with $d=21$ and $k=100$ shortest $n=137$??

Gilbert and Varshamov come to our help, and show that there always exists a code for which $$ A_q(n,d) \ge \frac{q^n}{\sum_{j=0}^{d-1}\binom{n}{j}(q-1)^j}$$

So now you just need to plug in $A_q(n,21)=2^{100}$ and find $n$. Let me WolframAlpha this for you, it gives $n<189$ (actually, a tighter version of the GV bound for binary code will gives $n\le 185$. The GV bound even tells you how to find this code, in a greedy way (thus, doesn't expect it to be efficient).

The "answer"

As I said, the real answer may be difficult to find, and depend on $q$ (why are you so fixed on binary codes? maybe considering the message as symbols of a larger alphabet would give a better code? But then the noise pattern need to be properly defined, so I'll skip this explanation here), and on $k$ (why 100 bits? ain't 127 bits make much more sense?), and on efficiency (which you asked, but I will not get into), etc.

The bottom line, is that many people thought on such questions over the last century, for various parameters. The problem is to find the code you need. For this purpose you can look at some Code tables that aggregate codes of various parameters.

For instance, for short binary codes, there is this work, "Bounds for binary codes of length less than 25" (IEEE Trans. Inform. Theory 24:81-93, 1978). A more extensive list can be found, for instance, at Code Tables: Bounds on the parameters of various types of codes. Searching in Table of Nonlinear Binary Codes for a code with the parameters you gave in your question, I found that the shortest code for distance 21 that has at least $2^{100}$ codewords has length $n=191$. The reference is H. J. Helgert and R. D. Stinaff. "Shortened BCH codes", IEEE Trans. Inform. Theory, 19:818--820, 1973. Damn, this is worse than the GV bound, but it allows you to code messages of up to $k=116$ bits! instead, If you are willing to settle with messages of length $k=91$ bits, you will be able to do that with $n=173$ bits. (M. Kasahara, Y. Sugiyama, S. Hirasawa, and T. Namekawa., IEEE Trans. Inform. Theory, 22:462--468, 1976). Using this table for linear binary codes, there is a constructive code with $k=100$ and $n=176$. There references are inside this link.

So back to you innocently looking question: the answer(1) is 76.


(1) At least, the best answer I was able to google. (ADDENDUM, found a better one:) and re-google.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/43390

3.2K people like this

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback