World's most popular travel blog for travel bloggers.

[Solved]: Micro-optimisation for edit distance computation: is it valid?

, , No Comments
Problem Detail: 

On Wikipedia, an implementation for the bottom-up dynamic programming scheme for the edit distance is given. It does not follow the definition completely; inner cells are computed thus:

if s[i] = t[j] then     d[i, j] := d[i-1, j-1]       // no operation required else   d[i, j] := minimum              (                d[i-1, j] + 1,  // a deletion                d[i, j-1] + 1,  // an insertion                d[i-1, j-1] + 1 // a substitution              ) } 

As you can see, the algorithm always chooses the value from the upper-left neighbour if there is a match, saving some memory accesses, ALU operations and comparisons.

However, deletion (or insertion) may result in a smaller value, thus the algorithm is locally incorrect, i.e. it breaks with the optimality criterion. But maybe the mistake does not change the end result -- it might be cancelled out.

Is this micro-optimisation valid, and why (not)?

Asked By : Raphael

Answered By : A.Schulz

I don't think that the algorithm is flawed. If two strings are matched, we compare first its last two characters (and then recurse). If they are the same, we can match them to get an optimal alignment. For example, consider the strings test and testat. If you don't match the two last ts, than one of the ts remains unmatched, since otherwise your matching would look like this:

enter image description here

This is impossible, since the arrows are not allowed to "cross". The matched t induces several inserts (green boxes in the figure), as depicted on the left:

enter image description here

But then you can simply find an equally good alignment, depicted on the right. In both cases you match a t and you have two inserts.

The argument for a substitution of one of the last ts is the same. So if you substitute one of the last ts, then you can instead match the last two t, and get a better alignment (see the picture).

enter image description here

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/2985

0 comments:

Post a Comment

Let us know your responses and feedback