Is it possible, in general, to determine whether two regular expressions admit the same set of strings?
Also, in particular, is it possible to determine whether a given regular expression admits the same set of strings as the regular expression ".*" (i.e., accepts anything).
Asked By : Paul Reiners
Answered By : Patrick87
This seems like a dupe, but I couldn't find one in a cursory search. Given that, we can solve your problem as follows:
- Generate NFAs $N_1$ and $N_2$ for the regular expressions $r_1$ and $r_2$;
- Generate DFAs $M_1$ and $M_2$ for the NFAs $N_1$ and $N_2$;
- Generate DFAs $D_1$ and $D_2$ such that $L(D_1) = L(M_1) \setminus L(M_2)$ and $L(D_2) = L(M_2) \setminus L(M_1)$;
- Determine whether $L(D_1) = L(D_2) = \emptyset$; if so, $L(r_1) = L(r_2)$; else, $L(r_1) \neq L(r_2)$.
You can do (1) by using Kleene's theorem demonstrating that regular expressions and finite automata have the same expressive power. To each of the operations union, concatenation, and Kleene closure, there corresponds an automaton-based construction which accepts what the regular expression generates. By recursivly applying these constructions to subexpressions you can build an NFA to accept the language generated by a regular expression.
To do (2), you typically will want to use the powerset construction (sometimes called the subset construction). This involves constructing a DFA whose set of states equals the set of subsets of stats from the NFA. Transitions now connect subsets of states which are reachable from each other. The resulting DFA may have up to $2^{|Q|}$ states, where $Q$ is the set of states in your NFA.
To do (3), you can use the Cartesian product machine construction to generate machines with up to $|Q_1| \times |Q_2|$ states. Every state in your product machine will correspond to a pair of states from the operand machines. You must choose the set of accepting states to effect the correct operation; in our case, set difference.
To do (4), you can try all strings of length up to and including $|Q|$. If the machine accepts anything, it must accept some string of such a length. If you check all the strings and nothing's accepted, you know that $L(M) = \emptyset$. If that's the case, by construction, then the one language is a subset of the other. If both languages are subsets of each other, the languages are equal, by definition of set equality.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/12876
0 comments:
Post a Comment
Let us know your responses and feedback