World's most popular travel blog for travel bloggers.

[Solved]: Shortest sub-sequence of one string, that's not a sub-sequence of another string

, , No Comments
Problem Detail: 

Given two strings $x$ and $y$ over the alphabet $\{A,C,G,T\}$, I'm trying to determine a shortest string $z$ such that $z$ is a subsequence of $x$ and not a subsequence of $y$.

Example: a shortest string that is a subsequence of AATGAAC but not a subsequence of ATCGATC is AAG.

Asked By : Mike

Answered By : Aryabhata

A dynamic programming algorithm seems to work.

Given a prefix $X_k = x_1 x_2\dots x_k$ of string $X$ ($X$ is of length $s$, so $X = X_s$), a prefix $Y_m = y_1 y_2 \dots y_m$ of string $Y$ ($Y$ is of length $t$, so $Y_t = Y$), and a length $L$, we determine if there is a subsequence of $X_k$ which is also a subsequence of $Y_m$, such that the sub-sequence is of length exactly $L$ and ends at $x_k$.

Call that (boolean) value: $\text{is_there}[k,m,L]$.

This satisfies the recurrence:

$$\text{is_there}[k,m,L] = \text{is_there}[k, m-1, L] \vee (x_k = y_m \wedge \text{is_there}[k-1,m-1,L-1])$$

The smallest $L$ for which there is some $k$, such that $\text{is_there}[k, t, L] = false$ gives you the result. (Note that $t$ is the length of $Y$).

If the length of the shortest such subsequence is $S$, This can be implemented in $O(stS)$ time and $O(stS)$ space by a triply nested for-loop with $L$ on the outside, then $k$, then $m$.

Something like:

Compute isThere for L = 1. foreach L in (2 ... s)     foreach k in (1 ... s)        foreach m in (1 ... t)          is_there[k,m,L] = blah 
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/12037

0 comments:

Post a Comment

Let us know your responses and feedback