I have a question regarding recursion in Viterbi algorithm.
Define $\pi(k; u; v)$ which is the maximum probability for any sequence of length $k$, ending in the tag bigram $(u; v)$.
The base case if obvious $\pi(0,*,*)=1$
The general case.
$\pi(k,u,v) = max_{w \in K_{k-2} } \pi(k-1,w,u) \cdot q(v|w,u) \cdot e(x_k|v)$
The author justifies the recursion as folllows:
How can we justify this recurrence? Recall that $\pi(k, u, v)$ is the highest probability for any sequence $y_{−1}...y_k$ ending in the bigram $(u, v)$. Any such sequence must have $y_{k−2} = w$ for some state $w$. The highest probability for any sequence of length $k − 1$ ending in the bigram $(w, u)$ is $\pi(k − 1, w, u)$, hence the highest probability for any sequence of length $k$ ending in the trigram $(w, u, v)$ must be $\pi(k − 1,w, u) \cdot q(v|w, u) \cdot e(x_k |v)$
I do not understand why it's actually true, I think it's possible to reach $\pi(n,u, v)$ from any $(n-1,w, u)$ not actually the maximum one $\pi(n-1,w, u)$ just because $q(v|w, u) \cdot e(x_k |v)$ might have a higher influence on the resulting $(n,u, v)$ than any $\pi(n-1,w, u)$.
I would appreciate if anyone could explain me why it's true.
Asked By : user16168
Answered By : Yuval Filmus
The maximum is over the entire expression $\pi(k-1,w,u) \cdot q(v|w,u) \cdot e(x_k|v)$, for exactly the same reason that is worrying you. Moreover, under your original interpretation, there is no reason to keep $\pi(k-1,w,u)$ for the non-optimal $w$. We are forced to store $\pi(k-1,w,u)$ for all $w$ exactly for the reason you mention.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/19093
0 comments:
Post a Comment
Let us know your responses and feedback