World's most popular travel blog for travel bloggers.

[Solved]: Are nearly all natural numbers compressible?

, , No Comments
Problem Detail: 

A simple counting argument shows most strings can't be compressed to shorter strings. But, compression is usually defined using Kolmogorov complexity. A string is compressible if its Kolmogorov complexity is less than its length, $K(s) < |s|$. The Kolmogorov complexity of a string is defined as the size of the smallest Turing machine (TM) that writes the string and halts when given a blank tape. I want to define the size of a TM as the number of states the machine has. I want to use a simple type of TM known as a Busy Beaver. BB's use two symbols, 0 and 1. A BB blank tape is initialized to all 0's and is unbounded in both directions.

Let the Kolmogorov complexity of natural number $n$, $K(n)$, be the number of states in the smallest BB that writes $n$ 1's and halts when given a blank tape. I use this definition so I can use known results about busy beavers, particularly Rado's sigma function, $\sum(n)$. $\sum(n)$ is defined as the maximum number of 1's an n-state BB can write and halt when given a blank tape. It is known $\sum(n)$ grows faster than any recursive function. This shows there is no recursive limit on how much a natural number can be compressed.

$\sum(2)=4$ so $K(4)=2$. This means 4 is a compressible number. This is the definition of the BB:

A0: 1RB / B0: 1LA

A1: 1LB / B1: 1RHalt

"A0: 1RB" means in state A on input 0 write 1, move one position right, and switch to state B. We can "extend" this machine to create new machines that compress natural numbers. To extend this machine replace the halt state with a new state. On input 0 this state writes 1 and halts. On input 1 it writes 1, moves one position right, and stays in the same state.

A0: 1RB / B0: 1LA / C0: 1RHalt

A1: 1LB / B1: 1RC / C1: 1RC

This 3 state machine writes 5 1's and halts. This proves $K(5) \leq 3$ and 5 is a compressible number. In fact, all numbers greater than 3 are compressible by at least 2 because we can write the first 4 1's using only 2 states. $\forall n(n>3 \rightarrow K(n) \leq n-2)$

We know natural number $n$ can be represented as a binary number with $log_2(n)+1$ bits. Let a number be super compressible if $K(n) < log_2(n)+1$. We see 4 is super compressible because $K(4)=2 < log_2(4)+1 = 3$. It is known $\sum(5) \ge 4098$. This means $K(4098) = 5$ and 4098 is super compressible. By extending this 5-state machine we can show 4099 through 4107 are also super compressible. In general, the range $\sum(n)$ through $\sum(n)+log_2(\sum(n))-n$ will be super compressible. For example, it is known $\sum(6) \ge 3.515 * 10^{18267}$. This proves the range $3.515 * 10^{18267}$ through $3.515 * 10^{18267} + log_2(10^{18267})$ is super compressible.

Are all large enough natural numbers super compressible? If not, why not?

Note there are lots of machines besides the ones defined by $\sum(n)$. There are probably lots of halting 5-state machines that write more that $2^5$ 1's and less than 4098 1's. All such machines define ranges of super compressible numbers. I previously asked a similar question without getting an adequate answer.

Asked By : Russell Easterly

Answered By : orlp

I'm not entirely sure what you're scheme is. However, given the information in your question, I should be able to simply answer the question in your title.

It seems that in your scheme the asymptotic growth of the range of numbers that are "super compressible" is:

$$[10^n, 10^n+log(10^n)] = [10^n, 10^n+n]$$

However, $\lim_{n\to\infty} {n \over 10^n} = 0$, so I'm afraid that when the numbers get big nearly no numbers are super compressible. Additionally, since $\lim_{n\to\infty} {cn \over 10^n} = 0$ for any constant $c$, this result is irrespective of how many sequences of precomputed 'super compressible' numbers you find.


It's hard (at least for me, give a call to our friend Shannon) to show this rigorously, but even if you add more and more sequences of 'super compressible' numbers, you will never be able to represent more than double the range using one extra bit of information.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/47367

0 comments:

Post a Comment

Let us know your responses and feedback