The global alignment problem can be generalized by setting the cost of some boundary gaps to 0.
- Gaps at both end in both strings have 0 cost, then we get the semiglobal alignment.
- Gaps are at the left end of the first string, and right end of second string have 0 cost, then we get overlap alignment.
- If both boundary gaps for the second string has no cost, we get the fitting alignment.
But this can't capture local alignment. Seems strange to let the local alignment to be the odd man out, are there a general (but not too general) framework that captures all the above alignment problems?
Asked By : Chao Xu
Answered By : Hendrik Jan
Despite the different uses of the algoritms, and the various names, I personally would say this is a single algorithm, with different variations. To make this answer more readable, I will add some details.
The basic algoritm global alignment (aka Needleman-Wunsch, below left) compares two strings $x=x_1\dots x_m$ and $y = y_1 \dots y_n$ (with match/mismatch score $\sigma$ between letters and negative gap cost $g$ between letter and missing letter). It is computed using an array $A$ and the recursion $$A[i,j] = \max \left\{\begin{array}{lll}A[i,j-1]&+&g\\A[i-1,j-1]&+&\sigma(x_i,y_j)\\A[i-1,j]&+&g\end{array}\right.$$
 The starting values along the borders of the matrix are $A[i,0]=i\cdot g$ and $A[0,j]=j\cdot g$. 
The value of the alignment is found in the last cell of the computation, and the alignment itself can be found by tracing back how this maximum value is obtained.
For local alignment (aka Smith-Waterman, below right) we need not a complete match but only the optimal value of matches between substrings of $x$ and $y$. That can be obtained by considering only part of the 'full' computation, ignoring any prefix or suffix with a negative value. This requires two changes. First any negative value obtained by the basic recursion is set to $0$, and also we look for a maximum inside the matrix (not the last cell). As any negative values are set to $0$ this effects the boundary values in the same way: they are set to $0$.
The three variants of semi-global alignment you mention (including overlap and fitting alignment) are somewhat in the middle of these two, as they carefully select which prefixes and suffixes do not count towards cost. They each follow the basic recursion (allowing negative values). Such alignments follow a path from one border of the matrix to another one. Starting at a left or top border means ignoring prefixes, and setting the corresponding border to $0$. Ending at a right or bottom border (i.e. choosing the maximum value at that side) corresponds to ignoring that suffix.
The two pictures below represent overlap and containment/fitting.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/13354
 
0 comments:
Post a Comment
Let us know your responses and feedback