World's most popular travel blog for travel bloggers.

[Solved]: What is the algorithm for Shannon-Fano code? am I correct?

, , No Comments
Problem Detail: 

I am wondering what is the true algorithm for the Shannon-Fano code? The the result I am getting based on the Algorithm in Wikipedia page contradicts the supposed/expected length of the produced code. According to the proof of Kraft's inequality, $l_i = \lceil \log_2{\frac{1}{P_i}} \rceil= \lceil -\log_2{P_i} \rceil$.

A Shannon–Fano tree is built according to a specification designed to define an effective code table. The actual algorithm is simple:

  1. For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol's relative frequency of occurrence is known.
  2. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right.
  3. Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible.
  4. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with
  5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

For example:

$S_1 \rightarrow P_{S_1} = \frac{4}{9} \rightarrow Code = 0 \mapsto$ And: $\lceil \log_2{\frac{9}{4}}\rceil = 2 \neq 1$ Bad!

$S_2 \rightarrow P_{S_2} = \frac{2}{9} \rightarrow Code = 10 \mapsto$ And: $\lceil \log_2{\frac{9}{2}}\rceil = 3 \neq 2$ Bad!

$S_3 \rightarrow P_{S_3} = \frac{2}{9} \rightarrow Code = 110 \mapsto$ But: $\lceil \log_2{\frac{9}{2}}\rceil = 3 = 3$

$S_4 \rightarrow P_{S_4} = \frac{1}{9} \rightarrow Code = 111 \mapsto$ And: $\lceil \log_2{\frac{9}{1}}\rceil = 4 \neq 3$ Bad!

We can see that the length of produced Shannon-Fano code is $1$, but it supposed to be $2$. Which mean this algorithm is not correct. What is correct algorithm then?

Additional note: If we look at the example 1 of this document, we can see that the length of $A4$ is supposed to be $3$ not $4$. The same contradiction. Another contradiction in example 1 of this other document. I think it is clear what I am talking about.

More additional note: Here is the page 45 of this textbook -Information and Coding Theory (Springer Undergraduate Mathematics Series) 2000th Edition enter image description here

Asked By : Node.JS

Answered By : Yuval Filmus

You are confusing "Shannon coding" from "Shannon–Fano coding" (terminology could vary across sources). Per Wikipedia, Shannon–Fano coding is the algorithm you mention, while Shannon coding is any coding assigning a symbol occurring with probability $p_i$ a codeword of length $\ell_i = \lceil \log_2 \frac{1}{p_i} \rceil$. Per Wikipedia, Shannon–Fano coding always leads to codewords whose lengths are within one bit of $\log_2 \frac{1}{p_i}$. This is also a feature of Shannon coding, but the two need not be the same. In particular, Shannon–Fano coding always saturates the Kraft–McMillan inequality, while Shannon coding doesn't.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/48465

 Ask a Question

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback