World's most popular travel blog for travel bloggers.

[Solved]: Why is the transform in Schönhage–Strassen's multiplication algorithm cheap?

, , No Comments
Problem Detail: 

The Schönhage–Strassen multiplication algorithm works by turning multiplications of size $N$ into many multiplications of size $lg(N)$ with a number-theoretic transform, and recursing. At least I think that's what it does because there's some other cleverness and I really don't understand it well enough to summarize accurately. It finishes in $O(N \cdot lg(N) \cdot lg(lg(N)))$ time.

A number-theoretic transform is exactly like a Discrete Fourier Transform, except it's done in the finite field $F_{2^N+1}$ of integers modulo $2^N+1$. This makes the operations a lot cheaper, since the Fourier transform has a lot of multiplying by roots of unity, and $F_{2^N+1}$'s roots of unity are all powers of 2 so we can just shift! Also, integers are a lot easier to work with than floating point complex numbers.

Anyways, the thing that confuses me is that $F_{2^N+1}$ is very large. If I give you a random element from $F_{2^N+1}$, it takes $O(N)$ bits to specify it. So adding two elements should take $O(N)$ time. And the DFT does a lot of adding.

Schönhage–Strassen splits the input into $\frac{N}{lg(N)}$ groups with $lg(N)$ bits. These groups are the values of $F_{2^N+1}$ that it will transform. Each pass of the DFT will have $O(\frac{N}{lg(N)})$ additions/subtractions, and there are $O(lg(\frac{N}{lg(N)}))$ passes. So based on addition taking $O(N)$ time it seems like the cost of all those additions should be $O(N \frac{N}{lg(N)} lg(\frac{N}{lg(N)}))$, which is asymptotically the same as $O(N^2)$.

We can do a bit better than that... because the values start out so small, the additions are quite sparse. The first pass' additions really only cost $O(lg(N))$ each, and the second pass' cost $2^1 O(lg(N))$ each, and the i'th pass' cost $O(min(N, 2^i \cdot lg(N)))$ each, but that still all totals out to a terrible $\frac{N^2}{lg(N)}$.

How is Schönhage–Strassen making the additions cheap? How much do they cost overall?

Is it because the algorithm actually uses $F_{N+1}$ (with $N$ guaranteed to be a power of 2)? There's enough stacked $2^{2^{n}}$ and german in the paper that I'm really not sure. On the other hand, I don't think that guarantees enough roots of unity for things to work out.

Asked By : Craig Gidney

Answered By : Yuval Filmus

According to the Wikipedia article, at each step the length of the integers is reduced from $N$ to (roughly) $\sqrt{N}$, and there are (roughly) $\sqrt{N}$ of them, and so the additions only cost $O(N)$. There is a detailed analysis of the running time in the final paragraph of the linked section, copied here in case it changes:

In the recursive step, the observation is used that:

  • Each element of the input vectors has at most $N/2^k$ bits;
  • The product of any two input vector elements has at most $2N/2^k$ bits;
  • Each element of the convolution is the sum of at most $2^k$ such products, and so cannot exceed $2N/2^k + k$ bits.

Here $N$ is current input length and $k = \Theta(\sqrt{N})$. Arithmetic is done modulo $2^n+1$, where $n$ is some multiple of $2^k$ which is larger than $2N/2^k + k$; note that $n = \Theta(\sqrt{N})$.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/28339

 Ask a Question

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback