World's most popular travel blog for travel bloggers.

[Solved]: LALR(1) parsers and the epsilon transition

, , No Comments
Problem Detail: 

I am having trouble getting my head wrapped around epsilon transitions while creating an LALR(1) parse table.

Here's a grammar that recognizes any number of 'a' followed by a 'b'. 'S' is an artificial start state. '$' is an artificial 'EOS' token.

 0.    S -> A $ 1.    A -> B b 2.    B -> B a 3.    B -> epsilon 

Itemsets:

i0: S -> . A $     A -> .B b     B -> .B a     A -> B . b  ! because B could -> epsilon     B -> B . a  !    "  i1: S -> A . $  i2: S -> A $ .  i3: A -> B . b  ! from i0     B -> B . a  i4: A -> B b .  ! from i0 or i3; the LALR algorithm compresses identical states.  i5: B -> B a .  ! from i0 or i3: the LALR algorithm compresses identical states. 

I previously had a description on how this would work to parse a simple string. I removed it because I know less now than I did before. I can't even figure out a parse tree for 'ab'.

If someone could show me how I have mis-constructed my itemsets and how I'd reduce the epsilon transition I'd be grateful.

Asked By : Tony Ennis

Answered By : rici

Your states and itemsets are not quite correct. The epsilon production must appear in relevant itemsets, and you have combined two states into one, which would produce a shift-reduce conflict if the epsilon production were added to the itemset (which should be done).

The following was generated with bison (using the --report=all command-line option); it differs from the theoretic model because the grammar has been "augmented" with an extra start symbol and an explicit end-of-input marker ($end). Also, it has done some table compression, so in the action tables, you can think of $default as meaning "either a or b".

It is worth explaining how State 0 comes about, since it shows how epsilon productions are handled (no differently from other productions).

We start with $accept: . S $end, by definition. ($accept is the starting state). Then the closure rule is applied as long as possible. Remember that the closure rule is: If any item in the itemset, the . is immediately before a non-terminal, add all the productions for that non-terminal with an initial .. Hence we add:

S: . A 

continuing with A:

A: . B 'b' 

continuing with B:

B: . B 'a' B: . 

We can't apply closure any longer, so we're done. Since the state now has an item with the dot at the end (the epsilon production for B), a reduction is possible.

State 0     0 $accept: . S $end     1 S: . A     2 A: . B 'b'     3 B: . B 'a'     4  | .      $default  reduce using rule 4 (B)         S  go to state 1     A  go to state 2     B  go to state 3  State 1     0 $accept: S . $end      $end  shift, and go to state 4  State 2     1 S: A .      $default  reduce using rule 1 (S)  State 3     2 A: B . 'b'     3 B: B . 'a'      'b'  shift, and go to state 5     'a'  shift, and go to state 6  State 4     0 $accept: S $end .      $default  accept  State 5     2 A: B 'b' .      $default  reduce using rule 2 (A)  State 6     3 B: B 'a' .      $default  reduce using rule 3 (B) 

In State 0, the closure rule has added the epsilon production (line 4). Furthermore, no item in the state 0 itemset has the point before a terminal. So with any lookahead, the parser is forced to reduce the epsilon production, after which it will use the goto function for state 0 to decide to move to state 3. (In your state machine, states 0 and 3 are conflated, but I do not believe this is correct.) State 3 will definitely shift a terminal; with the input ab$end, it will shift the a and move to state 6, which will then reduce a B. And so on.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/33999

0 comments:

Post a Comment

Let us know your responses and feedback