World's most popular travel blog for travel bloggers.

# $FIRST_k$ set -- is algorithm given by Aho&Ullman correct?

, ,
Problem Detail:

I am writing this, because no matter how I look at the algorithm I see an error :-)

Aho & Ullman in "Theory of Parsing, vol.1" (pp. 357-359) described the algorithm for building $FIRST_k(A)$ set for any $A$ non terminal by incrementally building helper set $F_i(A)$ which has such property that every $F_i(A)$ set is contained in $F_{i+1}(A)$ and it is finally equal to $FIRST_k(A)$.

However I found a counter example for this, for $k=2$.

A := a B B := b 

$FIRST_k(A) = \{ a b \}$, while $F_0(A) = \{ a \}$.

Any help is appreciated, here is shortened content of those pages -- FIRST(k) -- and if you don't have a book, I asked the same question in longer form as blog-post -- LALR(k) FIRST sets.

###### Answered By : Hendrik Jan

In your example $F_0(A) = \varnothing$ (!) while $F_0(B) = \{b\}$. Then $F_1(A) = \{ab\}$. Voila.

By construction $F_0(A) = \{ x \in\Sigma^* \mid A\to x\alpha \text{ where } |x|=k \text{ or } \alpha = \varepsilon \}$. The only production $A\to aB$ has no terminal prefix with length two, so $F_0(A)$ must be empty.

On the other hand, also the production for $B$ is shorter than two. But here we include the string $b$ as in consists of terminals only (the $\alpha=\varepsilon$ case).

As you note, $F_{i+1}(A) = F_i(A) \cup \text{ something }$, so we have $F_0(A) \subseteq F_1(A) \subseteq F_2(A) \subseteq \dots$. The sets are increasing. The construction stops when two consecutive sets are equal (for all $A$), nothing will be added after that moment.