World's most popular travel blog for travel bloggers.

# optimal lifted phylogenetic alignment - how to improve time complexity?

, ,
Problem Detail:

optimal lifted alignment - is a dynamic programming algorithm. Its input is a tree $T$ with $k$ strings assigned to its leaf nodes (their length is $n$).

The algorithm then assigns strings to the internal nodes s.t. the sum of distance (global alignment distance) on all the edges would be minimal, under the constraint that the string assigned to each node is one of the strings of its immediate children (hence the name "lifted").

The algorithm first computes distances $D$ between all pairs $O(k^2n^2)$. For each leaf node $v$ assigned $S_v$ we initialize: $d(S, v) = 0 | S = S_v$ and $d(S, v) = \infty | S \neq S_v$.

We then traverse post-order and for each internal node $v$, for each string $S$ assigned to one of its leaf descendants (all strings in its subtree) does $d(v,S) = \sum{\min(D(S,T) + d(w,T))}$ where $w$ are $v$'s immediate children. Overall = $O(k^2n^2 + k^3)$.

I've been told its possible to improve the complexity to $O(k^2n^2 + k^2)$ but I can't see how? Would appreciate any hint.

###### Answered By : Yuval Filmus

Consider a node $x$ with children $x_1,\ldots,x_d$, and denote by $N(v)$ the number of leaves in the subtree $T(v)$ rooted at $v$. We can implement the processing of $x$ as follows:

• Go over $1 \leq i \leq d$:
• Choose a leaf $f_i \in T(x_i)$ as the label of $x$ and $x_i$.
• For all $j \neq i$, choose the leaf $f_j \in T(x_j)$ that minimizes $D(f_i,f_j) + d(x_j,f_j)$.
• Store $d(x,f_i) = \sum_{j \neq i} [D(f_i,f_j) + d(x_j,f_j)]$.

The processing time of $x$ is proportional to $$\sum_{i=1}^d N(x_i) \times \sum_{j \neq i} N(x_j) = \left(\sum_{i=1}^d N(x_i)\right)^2 - \sum_{i=1}^d N(x_i)^2 = N(x)^2 - \sum_{i=1}^d N(x_i)^2.$$ Summing over all nodes $x$ up to the root $r$, we see that the total processing time is proportional to $N(r)^2 = k^2$.