The case of multiple strings. A slight modification of the dynamic programming algorithm for two strings is used as a subroutine. Here is the pseudo code:
# The modified dynamic programming algorithm for longest common subsequence. # Inputs: x, y - the strings xL, yL - their respective length vectors lcs(x, xL, y, yL) u = |x| v = |y| T = 2-dimensional table with default value 0 for all elements. for i in 0..u-1 for j in 0..v-1 if x[i] == y[j] T[i, j] = T[i - 1, j - 1] + 1 else T[i, j] = max(T[i - 1, j], T[i, j - 1]) # limit the value of the table element by the length vectors: T[i, j] = min(T[i, j], xL[i], yL[j]) # Take the element wise minimum of the vectors for each of the strings and the bottom row and right-most column respectively. return x, element-wise-minimum-of(xL, T[_, v - 1]), y, element-wise-minimum-of(yL, T[u - 1, _]) # Algorithm to compute the length of the longest common subsequence of multiple strings. # Inputs: strings - a list of strings mlcs(strings): if for any string x: |x| == 0 return 0 R = a dictionary indexed by a string x, where the value is initialized to be the length vector of x (initially [1,2,..,|x|]). N = initially a copy of R. while true for i in 0..|strings| for j in i + 1..|strings| x, y = strings[i], strings[j] _, xL, _, yL = lcs(x, R[x], y, R[y]) R[x], R[y] = xL, yL if for every string x: N[x] is element wise equal to R[x] break else N[x] = R[x] return the last element of any of the vectors in R
What follows is an implementation in the python language.
import collections def lcs(x, xL, y, yL): u = len(x) v = len(y) T = collections.defaultdict(int) for i in range(u): for j in range(v): if x[i] == y[j]: T[i, j] = T[i - 1, j - 1] + 1 else: T[i, j] = max(T[i - 1, j], T[i, j - 1]) T[i, j] = min(T[i, j], xL[i], yL[j]) return (x, [min(xL[i], T[i, v - 1]) for i in range(u)]), (y, [min(yL[j], T[u - 1, j]) for j in range(v)]) def mlcs(strings): if any(len(x) == 0 for x in strings): return 0 R = {x: list(range(1, len(x) + 1)) for x in strings} N = {x: R[x] for x in strings} while True: for i in range(len(strings)): for j in range(i + 1, len(strings)): x, y = strings[i], strings[j] (_, xL), (_, yL) = lcs(x, R[x], y, R[y]) R[x] = xL R[y] = yL if all(all(N[x][i] == R[x][i] for i in range(len(x))) for x in strings): break else: for x in strings: N[x] = R[x][:] return R[strings[-1]][-1]
The idea is to initialize for each input string a vector which holds information for each position of that string about what is the maximum length of a LCS up to that position. These are updated each time a pair-wise LCS computation is done. This in turn is done for all pairs until all the vectors are stable (ie. don't change) between iterations.
So far testing gives me mixed results. The correct length at first seems to be produced pretty reliably, but when trying to use the function mlcs as an oracle in an algorithm for retrieving an actual longest sequence, sometimes the result is not what is expected. This may or may not indicate that there are cases for which an incorrect length is produced.
So on to the question: Is the idea of the length computing algorithm sound?
Asked By : user72935
Answered By : user72935
The algorithm is incorrect. That is it may return a length that is greater than the actual length of a longest common subsequence. Here is a test case which fails to produce the correct length: ['baaa', 'aaba', 'baba'].
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/60584
0 comments:
Post a Comment
Let us know your responses and feedback