World's most popular travel blog for travel bloggers.

Computing the mode of XOR subsequences

, , No Comments
Problem Detail: 

I was confronted with this problem in an online programming challenge and it has been bugging me since:

In the problem, you are given a list of 16-bit numbers, say $a_0, a_1, ..., a_n$. An "XOR subsequence" is defined as the exclusive-or combination of every element in a consecutive subsequence of the list. The problem is that you must find the most commonly occurring XOR subsequence out of every possible consecutive subsequence in the list. In the event of a tie, the lowest numeric XOR value out of the most frequently occurring ones wins.

The first solution I came up with was an $O(n^2)$ solution where I compute the XOR for consecutive subsequences $a_0,...,a_i$ from $i=0$ to $i=n$, which can be done in $O(n)$, and doing the same thing starting at $a_1$, etc, and incrementing a counter in a map for every result. I know this repeats a lot of work, so I tackled the problem from a different angle and tried to define it mathematically. I defined $X_{i,j}$ to be the XOR of $a_i,...,a_j$ so that I could write the following:

$X_{i,j} = \begin{cases} a_i & \text{if } i = j \\ a_j \oplus X_{0, j-1} & \text{if } i = 0 \text{ and } j > 0 \\ X_{0,i-1} \oplus X_{0,j} & \text{if } 0 < i < j \end{cases} $

I implemented that with memoization, which I believe eliminates all redundant calculations, but using memoization turns out to be slow and use $O(n^2)$ space when attempting to calculate all $X_{i,j}$ for $0 \leq i \leq n $ and $i \leq j \leq n$, which is unacceptable.

Is there any way I can make this any faster than $O(n^2)$? Or if not, is there any efficient way to reduce the number of computations? Or am I missing some simplification to the problem that doesn't require me to compute every $X_{i,j}$? I'm sort of new to dynamic programming, so it's been hard to wrap my brain around this problem. My original solution was too slow by a factor of about 5 seconds (vs the time limit of 4 seconds for the challenge), so I believe even a constant factor reduction in time would make the solution acceptable.

Asked By : chbaker0

Answered By : D.W.

Yes, this can be solved much more efficiently. You can solve this problem in $O(b 2^b)$ time using a FFT, if each number is a $b$-bit number. In the comments, you said $b=16$, so this should be an efficient solution: something like $2^{21}$ steps of computation.

Consider the sequence $b_{-1},b_0,b_1,\dots,b_n$ defined by $b_{-1}=0$ and $b_i = a_0 \oplus a_1 \oplus \dots \oplus a_i$. Now your problem is equivalent to asking for the value that appears most frequently, out of all of the values $b_i \oplus b_j$ with $i<j$.

Let $x \in \mathbb{N}^{2^b}$ be the characteristic vector of $b_{-1},b_0,\dots,b_n$. In other words, $x_v$ is equal to the number of times that $v$ appears in the values $b_{-1},b_0,\dots,b_n$; or, equivalently, $x_v = \#\{i : -1 \le i \le n, b_i = v\}$.

Let $y \in \mathbb{N}^{2^b}$ be the characteristic vector of $\{b_i \oplus b_j : i<j\}$. In other words, $y_v$ is equal to the number of times that $v$ appears in the values $b_i \oplus b_j$; or, equivalently, $y_v = \#\{(i,j) : -1 \le i < j \le n, b_i \oplus b_j = v\}$.

We see that $y = x * x$, i.e., $y$ is the convolution of $x$ with itself:

$$y_v = \sum_u x_u x_{u \oplus v}.$$

This convolution can be computed using a discrete Fourier transform on the group $(\mathbb{Z}/2\mathbb{Z})^b$. We find

$$y = \mathcal{F}^{-1}(\mathcal{F}(x)^2),$$

where here $\mathcal{F}$ is the Fourier transform and $\mathcal{F}^{-1}$ is the inverse Fourier transform.

Moreover, there is a fast Fourier transform (FFT) for this group. The FFT can be computed in $O(N \lg N)$ time, on sequences of length $N$. Here $N=2^b$, since the characteristic vector has length $2^b$, so the running time to compute $\mathcal{F}$ and $\mathcal{F}^{-1}$ is $O(b 2^b)$.

Once you have the characteristic vector $y$, you just look for the index $i$ such that $y_i$ is maximized. This is the desired output value.

All in all, this gives an algorithm with $O(b 2^b)$ running time. This is probably much better than your $O(n^2)$ algorithm, for your particular parameters, but implementing it takes considerably more sophistication (you have to know how to implement a non-standard FFT, for a particular finite group).

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/30595

0 comments:

Post a Comment

Let us know your responses and feedback