[Solved]: Finding farthest item in an array with duplicates

posted by , on 6:11:00 AM , No Comments

Problem Detail:

I have an array $A[]$ of size $L$, which contains numbers in the range $1 \ldots N$. Here $L>N$, so the array will contain repetitions. If $x,y$ are two numbers that are both present in the array, define the distance $d(x,y)$ to be the minimum difference in positions where $x,y$ appear, i.e.,

$$d(x,y) = \min \{|i-j| : A[i]=x, A[j]=y\}.$$

Given a number $x$ that is present in the array, I need to find the number $y$ in the array distance from $x$ is as large as possible. In other words, given $x$, I am trying to find $y$ that makes $d(x,y)$ as large as possible (subject to the restriction that $y$ is a number in $A[]$).

Can this be done in $o(L)$ time per query? It's OK to do some pre-processing of the array.

For example, suppose the array is $[1,3,2,3,4,5,3]$. Then $d(5,3)=1$, since there is a $5$ one position away from a $3$ (a $5$ appears at index $5$ and a $3$ appears at index $6$). If the query is $x=5$, the correct answer is $y=1$, since this makes the value of $d(5,y)$ as large as possible.

Asked By : SHB

Answered By : D.W.

Here's a scheme that does $O(L \lg L)$ preprocessing, and thereafter lets you answer query for the value $x$ in $O(rN \lg L)$ time, where $r$ counts the number of times that $x$ is repeated. This will be slightly more efficient than the naive algorithm for values $x$ that don't occur too many times in the array, but inefficient for common values.

Here's how. During preprocessing, we're going to build a map $f$ defined by

$$f(y,i) = \min \{ j>i : A[j]=y \}.$$

Similarly, we're going to build a map $g$ defined by

$$g(y,i) = \min \{j<i : A[j]=y \}.$$

(We'll also build a set of inverted indices that, given $x$, enables us to enumerate all indices where $x$ appears in $A$.)

Note that these maps $f,g$ help us compute the distance function $d$ efficiently. In particular,

$$d(x,y) = \min \{ f(y,i)-i : A[i]=x \} \cup \{ i-g(y,i) : A[i]=x \},$$

which can be computed with $r$ evaluations of $f$ and $r$ evaluations of $g$. Consequently, given $x$, we'll be able to find the value $y$ that maximizes $d(x,y)$ using $rN$ evaluations of $f,g$, by simply trying all possibilities for $y$.

We're going to use the preprocessing to arrange things so we can evaluate $f,g$ at an input of our choice in $O(\lg L)$ time. How do we do that?

Unfortunately, storing $f$ explicitly would take $NL$ space, which is too much. Fortunately, there is a lot of redundancy in $f$: there are long ranges of indices $i,i+1,\dots,i'$ such that $f(y,i)=f(y,i+1)=\dots=f(y,i')$. This is going to help us represent $f$ more compactly. Similarly for $g$.

In particular, we'll store $f(y,\cdot)$ as an interval tree: whenever we have an interval $[i,i']$ such that $f(y,i)=f(y,i+1)=\dots=f(y,i')=\alpha$, then we add a single interval to the tree with the mapping $[i,i'] \to \alpha$. In this way, $f$ is stored as $N$ different interval trees. It is possible to show that the total number of leaves, across all of these interval trees, is at most $L+N=O(L)$ (in particular, the tree for $f(y,\cdot)$ has at most $r'+1$ leaves, where $r'$ counts the number of occurrences of $y$ in $A$). If we store these trees as balanced binary trees, their height will be upper-bounded by $O(\lg L)$. Consequently, the total amount of space needed to store these trees is $O(L \lg L)$. You can also show how to generate these trees in $O(L \lg L)$ time. Finally, evaluating $f$ at a point of your choice amounts to a stabbing query in the corresponding interval tree, which takes $O(\lg L)$ time. This shows how to achieve the time complexity quoted in the first sentence of my answer.

As an optimization, if $x$ appears in $A$ more than $L/(N \lg L)$ times, you can answer the query in $O(L)$ time using a standard linear scan (skipping the above cleverness). Of course, you can detect quickly which values of $x$ occur that often by using the inverted indices built during preprocessing.

With this idea, we get a scheme that requires $O(L \lg L)$ preprocessing and lets us answer each subsequent query in $O(\min(L, rN \lg L))$ time, where $r$ counts the number of times that the queried value occurs in $A$.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/16739

Ask a Question

Download Related Notes/Documents

MCA IGNOU GROUP

[Solved]: Finding farthest item in an array with duplicates

Asked By : SHB

Answered By : D.W.

Best Answer from StackOverflow

Akshay Mishra

0 comments:

Post a Comment

Search By Course Code

Blog Archive

Popular Posts

Label Cloud

Ad Blocker

[Solved]: Finding farthest item in an array with duplicates

Asked By : SHB

Answered By : D.W.

Best Answer from StackOverflow

Akshay Mishra

0 comments:

Post a Comment

Search By Course Code

Blog Archive

Popular Posts

Label Cloud

Ad Blocker

Click to Download Now