Answered By : Alex ten Brink
Let's consider the following grammar:
1: $S \to A b$
2: $S \to B b$
3: $A \to a a$
4: $B \to a a a$
Obviously, this language is totally uninteresting: the only words in it are $aab$ and $aaab$. However, the grammar is not $LL(1)$ or $LL(2)$ (not sure if you know what that is, if you don't, then that's not a problem).
Let's try parsing the input $aaab$ by going through it one character at a time, while pretending we know absolutely nothing about the rest of the string, except the next token (essentially, we use 1 token of lookahead). We are allowed to use our knowledge of everything we've seen so far though.
Initially, we only know that the string starts with an $a$. This tells us absolutely nothing about whether we are in 'case 1' (production 1) or 'case 2' (production 2) of our grammar. Luckily, we don't really care at this point.
Let's skip over the $a$ and go to the next point. We remember the $a$ and look at the next token. This is another $a$. Note that by now, we've seen $aa$, which means that we might be in case 1: it might be the case that these first two $a$s constitute an instance of $A$! We'll just leave it at this for the moment, and first 'read' the $a$ officially.
We now look at our third token and find it is again an $a$. We now want to decide whether those two $a$s we saw earlier really constitute an $A$ ($LR(k)$ algorithms never delay this decision). As an $A$ is always followed by a $b$, we decide this is not the case: our next token is an $a$.
Reading the $a$, we know we're definitely in case 1 (assuming our input is correct). Our lookahead becomes a $b$, which is precisely what we expect. We therefore decide that the three $a$s we've seen so far are in fact a $B$. Note that at this point, we no longer need to 'remember' that we've seen three $a$s: all we need to know is that we've seen a $B$. This is effectively the same situation we'd be in if we would have had $Bb$ as input, and we'd have read $B$, and as this makes everything easier to remember, we'll do precisely that (this corresponds to a reduce action of an $LR(k)$ parser).
Finishing up, we read the $b$, find that we are at the end of the input, decide that the left part must be an $S$ and decide that the input was correct.
The $LR(k)$ algorithm works pretty much as described above. It remembers what it has seen so far and decides based on this information and the next token whether the last bit of what it has seen so far is 'actually' a nonterminal. It then replaces this by the nonterminal, shuffles some information around and continues as if the replacement had taken place in the input.
The crucial bit is finding out when what we've seen so far, or to be more precise the last bit of what we've seen so far, is actually an instance of a nonterminal, so an expansion of a production. In our example, if we had transformed the two $a$s at the start into an $A$, then everything would have went wrong. We say that $aa$ is not a handle at this point. However, $aaa$ was a handle, as we correctly replaced it by a $B$. A grammar is $LR(k)$ if we can always make correct decisions about what is a handle using $k$ tokens of lookahead and everything we've seen so far.
I have read a few algorithms for $LR(k)$ parser, in which there is a frequent mention of selecting a handle from the grammar given. For example, one document said - "the essence of LR parsing is identifying the handle on top of stack"
Please help me understand, what are handles in parsing?
Asked By : eshi14
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/7587
0 comments:
Post a Comment
Let us know your responses and feedback