I'm thinking about a routing system. Imagine I have the two following regexes
- pathpart1/pathpart2 => specific match that routes to controller1
- .* => catch-all that routes to controller2
And I let them match on a URL, e.g. 'pathpart1/pathpart2'.
They both match, but I would want to give prevalence to the most specific regex, i.e. the regex where the cardinality of all possible matches of that regex is the lowest.
Is there a good way to calculate that the first regex has a low cardinality on its match-set (so I want to to go with that match) and the second has a very high cardinality on its match-set (i.e. it is completely not specific, so a match is basically a catch all last resort)...?
I do not know upfront which routes are registered with the router, so I can't loop over them in order of cardinality by hand (i.e. low cardinality first, and the catch-all last if all others don't match).
Asked By : Willem Mulder
Answered By : Raphael
There are two angles I see to define "specifity" in this context.
- Specifity of a regular expression $r$ that matches input $w$ is the (inverse of the) number of words of length $|w|$ that also match $r$.
- Specifity of a regular expression $r$ that matches input $w$ is (the ratio of) the (maximum) number of symbols of $w$ that are matched by non-wildcard symbols in $r$ (and $|w|$).
The first option follows your idea: choose among all matches the regular expressions that matches the fewest others. Since there are in general infinitely many matching strings, we have to restrict ourselves to a certain length, or a finite set of lengths.
The second option follows another approach. If the input string is well described by a given expression, long parts of it should be (more or less) explicitly described in the regular expression. For example, $(aa|ab)+.^*bba$ is a more specific match for $aaababaababababba$ than $.^*$.
Option 1
In this case, we have to count all words in $\mathcal{L}(r)$ that have length $n$. Luckily, this is easy once we have an unambiguous, regular grammar¹ $G$ for $\mathcal{L}(r)$. See here how this can be done.
Option 2
Let's assume we have an unambiguous regular expression (for instance by specifying lazy wildcards). In that case, we can translate the regular expression into a DFA. Run the word through it and count how many non-wildcard transitions are taken.
If the expression is ambiguous, we can do the same with the corresponding NFA if we explore all possible paths.
- Convert $r$ to an $NFA$, determinise, convert to right-regular grammar. All conversions are standard and algorithmically feasible (if not necessarily efficient).
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/10786
0 comments:
Post a Comment
Let us know your responses and feedback