[Solved]: Finding the longest common subsequence algorithm using hash table Slow

posted by , on 8:45:00 AM , No Comments

Problem Detail:

I've designed an algorithm to find the longest common subsequence. these are steps:

Starts with i = 0

Picks the first letter from the first string start from ith letter.
Go to the second string looking for that picked letter.
If not found return to the first string and picks the next letter and repeat 1 to 3 until it finds a letter that is in the second string.
Now that found a common letter in the second string, adds that to $common_subsequence.
Store its position in $index.
Picks next letter from the first string and do step 2 but this time starts from $index.
Repeat 3 to 6 until reached end of string 1 or string 2.
If length $common_subsequence is greater than length of common subsequence so far add that change lcs to the $common_subsequence.
Add 1 to the i and repeat 1 to 9 while i is less that length of the first string.

This is an example:
‫‪

X=A, B, C, B, D, A, B‬‬   ‫‪Y=B, D, C, A, B, A‬‬

First pick A.
Look for A in Y.
Now that found A add that to the $common_subsequence.
Then pick B from X.
Look for B in Y but this time start searching from A.
Now pick C. It isn't there in string 2, so pick the next letter in X that is B.
...
...
...

The complexity of this algorithm is theta(n*m).

I implemented it on the two methods. The second one uses a hash table, but after implementing I found that it's much slower compared to the first algorithm. I cant undrestand why.

Here is my implementation:

First algorithm:

import time def lcs(xstr, ystr):     if not (xstr and ystr): return # if string is empty     lcs = [''] #  longest common subsequence     lcslen = 0 # length of longest common subsequence so far     for i in xrange(len(xstr)):         cs = '' # common subsequence         start = 0 # start position in ystr         for item in xstr[i:]:             index = ystr.find(item, start) # position at the common letter             if index != -1: # if common letter has found                 cs += item # add common letter to the cs                 start = index + 1             if index == len(ystr) - 1: break # if reached end of the ystr         # update lcs and lcslen if found better cs         if len(cs) > lcslen: lcs, lcslen = [cs], len(cs)          elif len(cs) == lcslen: lcs.append(cs)     return lcs  file1 = open('/home/saji/file1') file2 = open('/home/saji/file2') xstr = file1.read() ystr = file2.read()  start = time.time() lcss = lcs(xstr, ystr) elapsed = (time.time() - start) print elapsed

Second one using hash table:

import time from collections import defaultdict def lcs(xstr, ystr):     if not (xstr and ystr): return # if strings are empty     lcs = [''] #  longest common subsequence     lcslen = 0 # length of longest common subsequence so far     location = defaultdict(list) # keeps track of items in the ystr     i = 0     for k in ystr:         location[k].append(i)         i += 1     for i in xrange(len(xstr)):         cs = '' # common subsequence         index = -1         reached_index = defaultdict(int)         for item in xstr[i:]:             for new_index in location[item][reached_index[item]:]:                 reached_index[item] += 1                 if index < new_index:                     cs += item # add item to the cs                     index = new_index                     break             if index == len(ystr) - 1: break # if reached end of the ystr         # update lcs and lcslen if found better cs         if len(cs) > lcslen: lcs, lcslen = [cs], len(cs)          elif len(cs) == lcslen: lcs.append(cs)     return lcs  file1 = open('/home/saji/file1') file2 = open('/home/saji/file2') xstr = file1.read() ystr = file2.read()  start = time.time() lcss = lcs(xstr, ystr) elapsed = (time.time() - start) print elapsed

Asked By : Sajjad

Answered By : Yuval Filmus

One optimization is to replace the access to reached_index[item] by a local variable which is initialized from the hash table at the beginning of the for item loop, and stored back to the hash table at the end of the loop.

Another optimization which will speed up things is to replace the hash tables with arrays of size 256.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/7851

MCA IGNOU GROUP

[Solved]: Finding the longest common subsequence algorithm using hash table Slow

Asked By : Sajjad

Answered By : Yuval Filmus

Best Answer from StackOverflow

Akshay Mishra

0 comments:

Post a Comment

Search By Course Code

Blog Archive

Popular Posts

Label Cloud

Ad Blocker

[Solved]: Finding the longest common subsequence algorithm using hash table Slow

Asked By : Sajjad

Answered By : Yuval Filmus

Best Answer from StackOverflow

Akshay Mishra

0 comments:

Post a Comment

Search By Course Code

Blog Archive

Popular Posts

Label Cloud

Ad Blocker

Click to Download Now