World's most popular travel blog for travel bloggers.

# [Solved]: Minimal size of a context-free grammar which defines $L_n=\{a^k\mid 1\le k\le n\}$

, ,
Problem Detail:

I am looking for the minimal size of a context-free grammar which defines the finite language $$L_n=\{a^k\mid 1\le k\le n\}.$$ The size of a grammar is defined as the total length of all right-hand sides of the rules.
For example a grammar with $$S\to A_1A_1\\A_1\to A_2A_2\\A_2\to A_3A_3\\A_3\to A_4A_4\\A_4\to aa$$ has size $5\cdot 2=10$.
It is well-known that the minimal context-free grammar which defines the language only accepting the word $a^k$ has size $\Theta(\log k)$ (see the example above for the construction of the word $a^{32}$). Now, I am interested in the size of a grammar constructing all strings $a^k$ ($1\le k\le n$) for a given $n$. By the information above it is easy to create a grammar for $L_n$ of size $$\Theta(n+\sum_{k=1}^{n}\log k)=\Theta(\log n!)=\Theta(n\log n)$$ since we can create all words by a grammar of size $\Theta(\log k)$ and connect those grammars with a start rule of size $n$ (one symbol for each word). But since the final grammar is allowed to share nonterminals this construction is clearly not optimal. Maybe the smallest grammar has asymptotically the same size, but i don't know. Any help?

#### Answered By : Yuval Filmus

There is a grammar of size $O(\log n)$ using repeated squaring.

Let's start with the simpler case $n = 2^m$: \begin{align*} &S \to a B_0 B_1 \dots B_{m-1} \\ &B_i \to A_i | \epsilon && (0 \leq i \leq m-1) \\ &A_i \to A_{i-1}A_{i-1} && (1 \leq i \leq m-1) \\ &A_0 \to a \end{align*} Here $A_i \to a^{2^i}$ and $B_i \to a^{2^i}|\epsilon$.

More generally, suppose $n-1 = 2^{d_0} + 2^{d_1} + \cdots + 2^{d_\ell}$, where $d_0 > d_1 > \cdots > d_\ell$. The corresponding grammar is: \begin{align*} &S \to a B_{d_0} | a C_0 B_{d_1} | a C_1 B_{d_2} | \cdots | a C_{\ell-1} B_{d_\ell} | aC_\ell \\ &C_j \to C_{j-1} A_{d_j} && (1 \leq j \leq \ell) \\ &C_0 \to A_{d_0} \\ &B_i \to B_{i-1} A_{i-1} | B_{i-1} && (1 \leq i \leq d_0) \\ &B_0 \to \epsilon \\ &A_i \to A_{i-1}A_{i-1} && (1 \leq i \leq d_0) \\ &A_0 \to a \end{align*} Here $A_i \to a^{2^i}$, $B_i \to \{a^k : 0 \leq k < 2^i\}$ and $C_j \to a^{2^{d_0} + \cdots + 2^{d_j}}$.

As an extra property, both grammars are unambiguous.