Converting regular expressions into (minimal) NFA that accept the same language is easy with standard algorithms, e.g. Thompson's algorithm. The other direction seems to be more tedious, though, and sometimes the resulting expressions are messy.
What algorithms are there for converting NFA into equivalent regular expressions? Are there advantages regarding time complexity or result size?
This is supposed to be a reference question. Please include a general decription of your method as well as a non-trivial example.
Asked By : Raphael
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/2016
Answered By : jmad
There are several methods. Here I will describe the one usually taught in school which is very visual. I believe it is the most used in practice. However, writing the algorithm is not such a good idea.
State removal method
This algorithm is about handling the graph of the automaton and is thus not very suitable for algorithms since it needs graph primitives such as ... state removal. I will describe it using higher-level primitives.
The key idea
The idea is to consider regular expressions on edges and then removing intermediate states while keeping the edges labels consistent.
The main pattern can be seen in the following to figures. The first has labels between $p,q,r$ that are regular expressions $e,f,g,h,i$ and we want to remove $q$.
Once removed, we compose $e,f,g,h,i$ together (while preserving the other edges between $p$ and $r$ but this is not displayed on this):
Example
Using the same example as in Raphael's answer:
we successively remove $q_2$:
and then $q_3$:
then we still have to apply a star on the expression from $q_1$ to $q_1$. In this case, the final state is also initial so we really just need to add a star:
$$ (ε+(b+aa)(ba)^*(a+bb))^* $$
Algorithm
L[i,j]
is the regexp of the language from $q_i$ to $q_j$. First, we remove all multi-edges:
for i = 1 to n: for j = 1 to n: if i == j then: L[i,j] := ε else: L[i,j] := ∅ for a in Σ: if trans(i, a, j): L[i,j] := L[i,j] + a
Now, the state removal. Suppose we want to remove the state $q_k$:
remove(k): for i = 1 to n: for j = 1 to n: L[i,i] += L[i,k] . star(L[k,k]) . L[k,i] L[j,j] += L[j,k] . star(L[k,k]) . L[k,j] L[i,j] += L[i,k] . star(L[k,k]) . L[k,j] L[j,i] += L[j,k] . star(L[k,k]) . L[k,i]
Note that both with a pencil of paper and with an algorithm you should simplify expressions like star(ε)=ε
, e.ε=e
, ∅+e=e
, ∅.e=∅
(By hand you just don't write the edge when it's not $∅$, or even $ε$ for a self-loop and you ignore when there is no transition between $q_i$ and $q_k$ or $q_j$ and $q_k$)
Now, how to use remove(k)
? You should not remove final or initial states lightly, otherwise you will miss parts of the language.
for i = 1 to n: if not(final(i)) and not(initial(i)): remove(i)
If you have only one final state $q_f$ and one initial state $q_s$ then the final expression is:
e := star(L[s,s]) . L[s,f] . star(L[f,s] . star(L[s,s]) . L[s,f] + L[f,f])
If you have several final states (or even initial states) then there is no simple way of merging these ones, other than applying the transitive closure method. Usually this is not a problem by hand but this is awkward when writing the algorithm. A much simpler workaround is to enumerate all pairs $(s,f)$ and run the algorithm on the (already state-removed) graph to get all expressions $e_{s,f}$ supposing $s$ is the only initial state and $f$ is the only final state, then doing the union of all $e_{s,f}$.
This, and the fact that this is modifying languages more dynamically than the first method make it more error-prone when programming. I suggest using any other method.
Cons
There are a lot of cases in this algorithm, for example for choosing which node we should remove, the number of final states at the end, the fact that a final state can be initial, too etc.
Note that now that the algorithm is written, this is a lot like the transitive closure method. Only the context of the usage is different. I do not recommend implementing the algorithm, but using the method to do that by hand is a good idea.
0 comments:
Post a Comment
Let us know your responses and feedback