Let $L$ a regular language and define the subsequence closure of $L$ as
$\qquad \displaystyle S(L) = \{ w \mid \exists w' \in L.\ w \text{ subsequence of } w'\}$.
The problem I want to solve is to find for such subsequences $w \in S(L)$ which letters can be inserted into them so that the result is also in $S(L)$. Formally:
Given $w_1\dots w_n \in S(L)$, output all pairs $(i,a) \in \{0,\dots,n\} \times \Sigma$ for which $w_1 \dots w_{i} a w_{i+1} \dots w_n \in S(L)$.
Consider, for instance, the language$\{ab, abc, abcc\}$. The string $b$ is in $S(L)$ and inserting $a$ at the beginning -- corresponding to $(0,a)$ -- yields $ab \in S(L)$. On the other hand, the string $cb$ is not in $S(L)$; there is no way to convert it to a language string by insertion.
Using this language, if the input string is $b$ the possible insertions I am looking for are $(0,a)$ and $(1,c)$ at the end. If the input string is $bc$ the possible insertions are $(0,a), (1,c)$ and $(2,c)$.
The use of this algorithm is in a user interface: the user builds strings belonging to the language starting from an empty string and adding one character at a time in different positions. At each step the UI prompts the user with all the possible valid letters in all the possible insertion positions.
I have a working naive algorithm that involves a lot of back-tracking, and it is way too slow even in relatively simple cases. I was wondering if there is something better, or -- failing that -- if there are any available studies of this problem.
Asked By : MiMo
Answered By : MiMo
The solution is to compute the DFA of the subsequence closure of the original language, and then using this DFA I just used the 'trivial' algorithm: try to insert each possible symbol in each position, if the resulting word is valid for this new DFA - i.e. belongs to $S(L)$ - then the symbol can be inserted at that position.
Using the closure DFA it should be possible to use the faster algorithm suggested by Yuval Filmus in one of the other answers; I did not try because the 'trivial' algorithm already gives good enough performances: from tens of seconds with my previous solution to sub-second execution time (computing the closure DFA only once of course).
To compute the closure DFA:
- Create a new NFA with the same start and final states of the original DFA
- For each transition $q\stackrel{a}{\longrightarrow}q'$ in the original DFA the new NFA has two transitions: the same $q\stackrel{a}{\longrightarrow}q'$ and a null one $q\stackrel{\varepsilon}{\longrightarrow}q'$ (where $\varepsilon$ is the null transition - this is the $\varepsilon$-shortcut)
- Generate a DFA from this NFA using the usual system - this DFA is the closure one.
(Many thanks to Raphael for the help and for suggesting the solution)
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/3404
0 comments:
Post a Comment
Let us know your responses and feedback