World's most popular travel blog for travel bloggers.

Edit distance (Levenshtein-Distance) algorithm explanation

, , No Comments
Problem Detail: 

I want to calculate the edit distance (aka Levenshtein-Distance) between two words: «solo» and «oslo».

According to this site we'll get the result matrix:

Edit distance (Levenshtein-Distance) calculation matrix

What I don't understand is: In case of comparison the last «o» from «solo» with the first «o» of «oslo» will see the submatrix:

3 2
4
3

As far as I understand, in order to calculate the bottom right value, which is equal in this example to 3 of this submatrix, we'll get the min(3, 2, 4) + (0 if the letters are the same, otherwise 1). So, why in this example the bottom right value is equal to 3 and not to 2?

Asked By : Mike B.

Answered By : Mike B.

If we look deeply at algorithm implementation in C: http://en.m.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C

we'll see the main string which fills the matrix:

matrix[x][y] = MIN3(matrix[x-1][y] + 1, matrix[x][y-1] + 1,                      matrix[x-1][y-1] + (s1[y-1] == s2[x-1] ? 0 : 1)); 

as we can see, the value of cells with indexes matrix[x-1][y] (upper cell) and matrix[x][y-1] (left cell) in automatically increased, the upper-left cell (by diagonal) with index matrix[x-1][y-1] is increased by 1 only if the letters are different and only after that we'll take a minimum from these 3 values. In other words:

3 2
4
min(3+0, 2+1, 4+1) = 3

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/28734

0 comments:

Post a Comment

Let us know your responses and feedback