Problem: Given an array of $n$ integers, $A[1 \dots n]$, such that any integer occurs at most 2 times in the array, we have to find the number of unique increasing subsequences of length 3 (duplicate subsequences must be counted only once). In other words, we have to count the number of unique integers $A[i], A[j], A[k]$ such that $A[i] < A[j] < A[k]$ with $i < j < k$.
I have been stuck on this for quite a while now. I did look up this question which tests for the existence of such a triplet. But I think my question is different because it needs the count, and because it needs unique triplets (upto 1 repetition of any number is allowed).
Idea: $O(n^2)$ algorithm. For each number, we can scan the remaining array and find out how many unique numbers are greater than it and occur after it. This can be done in $O(n^2)$ by naive brute force. For every pair of numbers ($O(n^2)$), we now have the number of possible triplets that can be formed by using the pair as the first two numbers of the triplet (since we know how many are greater than the second number).
Unfortunately, $O(n^2)$ is too slow, and I need a faster solution - $O(n\log n)$ or $O(n)$. Space complexity also has to be sub-quadratic. I was thinking along the lines of sorting a copy of the array to find out the relative rank of each element, but could not go any further. Any help or hints are greatly appreciated!
Asked By : mayank
Answered By : Niel de Beaudrap
Here is an algorithm to count distinct subsequences of length $3$ in time $O(n \log n)$, regardless of how many times any value $a$ occurs in $A[j]$.
First, create a binary search tree $T$ containing all possible values $a = A[j]$, storing at each node how many times it occurs in $A$, and how many nodes there are which follow it in the tree (are further to the right) representing larger values $A[k]$. Note that with each insertion (and later, removals), the tally of further-right values will become out of date for many of the nodes. However, we can maintain these values for each node lazily, by maintaining at each parent node a count of the pending counter updates for its left and right subtrees. (When we later traverse to a child node, its own counter of larger elements, and its pending-update counters for its children can be adjusted by the amount indicated by its parent.) Similarly, any insertions or removals in the right subtree of a node leads to a pending update for the left subtree. This should also be implementable for balanced trees as well, such as AVL trees or red-black trees. We will henceforth suppose that as we traverse any particular node that its counter is thereby kept up-to-date, though it may become out of date in between visitations. Constructing $T$ takes time $O(n \log n)$.
We traverse $A$ once more, creating a tree $F$ which also contains values $a = A[j]$ and the number of nodes to the left of each node, similarly to $T$. (We also maintain a separate counter at each node of $F$ of how many nodes have been added to its left since the last time we look up the node in the tree; this uses a similar update method as the total-counter, though because we would clear this counter each time the node is visited, in practise we just use the pending-update value for its parent node.) At the same time we will be disassembling the tree $T$, and counting distinct increasing subsequences in $A$ of length 3 by considering, using $F$ and $T$, how many different values of $a'$ and $a''$ can serve as the first and respectively third value in an ordered triple $(a',A[j],a'')$. We do this as follows.
Set $N := 0$. For each $1 \leqslant j \leqslant n$ in order, do the following:
Let $a := A[j]$. Decrement the occurance counter of $a$ in $T$. This takes time $O(\log n)$.
Search for $a$ in $F$. If it is not present, insert it, and set $$\text{incr} := \#(\text{nodes left of $a$ in $F$}) \cdot \#(\text{nodes right of $a$ in $L$}).$$ Otherwise, if $a$ is already present, set $$\text{incr} := \#(\mathop{*}\text{newly added}\mathop{*} \text{ nodes left of $a$ in $F$}) \cdot \#(\text{nodes right of $a$ in $L$}).$$ This takes time $O(\log n)$ as well.
Add $\text{incr}$ to $N$.
If the occurances counter of $a$ in $T$ has dropped to $0$, remove $a$ from $T$. This takes time $O(\log n)$.
[Edited to add: as Peter Shor notes in the comments, rather than storing in $T$ the number of times $a$ occurs in $A$, we can simply store the last index $1 \leqslant j \leqslant n$ such that $a = A[j]$ for the purposes of this algorithm. Then, rather than actually removing $a$ from $T$, we can just perform the decrement of the further-right tallies for all of the nodes to its left. The decrease in arithmetic and tree-modification operations would give rise to some modest savings in run-time.]
The total run-time of this algorithm is $O(n \log n)$. In each iteration of the loop above, we only count the number of ways in which elements smaller than $a$ which haven't been used yet as a subsequence starter can be combined with elements larger than $a$ which still have at least one occurance later in the list, with $a$ itself in the middle. Thus, we count the number of "newly possible" subsequences there are for each $A[j]$ as new first elements are made available for each remaining available third element, and accumulating them for $1 \leqslant j \leqslant n$.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/7409
0 comments:
Post a Comment
Let us know your responses and feedback