It appears that "Basic Regular Expressions" as defined by POSIX.1-2008 do not support alternation, a|b (although some grep implementations recognize the escaped version, \|).
Since the regular languages are closed under union by definition, does this mean that POSIX BRE has less expressive power than a finite automaton? Or is there some way to simulate alternation using other constructs?
Asked By : Steve Kobes
Answered By : Gilles
Indeed the POSIX BRE language cannot express all regular expressions because it lacks alternation. It can't even recognize all finite languages, let alone all regular languages.
For example, $\{ab, ba\}$ is not recognizable as a BRE. To prove this, consider what the toplevel syntactic form could be:
- It can't be one of the single-character forms since the language has words of length $\gt 1$.
- It can't be $R^*$ because that would match the empty string.
- It can't be $R^{\{m,n\}}$ except with $m=n=1$ (in which case we're back to the original problem) because that would match strings of different lengths or the empty string.
- So it has to be concatenation: $R_1 R_2$. Now consider how $ab$ is recognized:
- If $R_1$ recognizes $ab$ then $R_2$ must not recognize anything anything other than the empty string. So $R_1$ must recognize $\{ab,ba\}$ and we're back to the original problem.
- If $R_1$ recognizes $a$ but not $ab$ then $R_2$ must recognize $b$. But then $R_1R_2$ recognizes all words of the form $u b$ where $R_1$ recognizes $u$, so $R_1$ must not recognize anything other than $a$. There's no way to recognize $ba$.
- If $R_1$ recognizes neither $ab$ nor $a$ then the only way for $R$ to recognize $ab$ is if $R_1$ recognizes the empty string, in which case we're back to the original problem as above, but for $R_2$ this time.
When "we're back to the original problem", that means that the only solution to find a BRE recognizes the language is to find a smaller BRE that has the same property. This is an infinite descent, so there's no BRE that has the desired property.
I don't think there's a "nice" characterization of BRE-recognizable languages, for example as languages recognizable by a "nice" class of automata.
Note that BRE-recognizable languages are actually not a subclass of regular languages, since backreferences add expressive power. For example $\{w w \mid w \in \{a,b\}^*\}$ is recognized by the BRE \(.*\)\1 but is famously not regular. BRE without backreferences are just syntactic sugar over regular expressions so the languages they can recognize are a subclass of the regular languages.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/62944
0 comments:
Post a Comment
Let us know your responses and feedback