World's most popular travel blog for travel bloggers.

[Solved]: Efficiently locating the maximum value in interval over large amounts of data points

, , No Comments
Problem Detail: 

I have a set of $n$ two-dimensional data points $(x_i,y_i)$. I want to efficiently answer the following query: Given an arbitrary interval of $x$-values, find the highest and lowest points within that interval. In other words, for this query we are given an interval $[\ell,u]$, and we want to find the point $(x_i,y_i)$ such that $x_i \in [\ell,u]$ and $y_i$ is maximal (or minimal).

What data structure would be appropriate for this task? I'm interested in both the static case (where the set of points doesn't change) and the dynamic case (where we also want to be able to insert and/or delete points from the set).

Asked By : Realz Slaw

Answered By : D.W.

This can be solved with a balanced binary tree. Each such query can be answered in $O(\lg n)$ time.

Build a balanced binary tree, where each leaf holds a point, and the tree is keyed on the $x$-coordinate of the points (so the leaves contain the points, sorted from left to right). Augment the tree, so that in each node $t$ of the tree, you store a pointer to the highest and lowest of all the points within the subtree rooted at $t$.

Now we are given an interval $[\ell,u]$, and asked for the highest point whose $x$-coordinate is within that interval. It turns out that any such interval can be expressed as a union of $O(\lg n)$ subtree. In other words, you can find nodes $t_1,t_2,\dots,t_k$ such that (1) every leaf that is a descendant of some $t_i$ has an $x$-coordinate in the interval $[\ell,u]$, (2) every leaf whose $x$-coordinate is in the interval $[\ell,u]$ is the descendant of one of these $t_i$'s, and (3) $k = O(\lg n)$. Recall that the augmentation tells you what is the highest leaf under each $t_i$. So, to answer the query, take the highest of all those: that will be the highest leaf whose $x$-coordinate is within $[\ell,u]$. Since we only need to examine $O(\lg n)$ nodes, the running time is $O(\lg n)$.

How do you find those $k$ nodes? That's described here: Delete a consecutive range of leaves from a binary tree. You can verify that you can find those $k$ nodes explicitly in $O(\lg n)$ time, given $\ell$ and $u$.

Conveniently, this data structure works both for the static case (where the set of points doesn't change) as well as the dynamic case (where we want to insert and delete points). You can insert and delete points in $O(\lg n)$ time: in particular, you can update the augmented information within those time bounds (you just have to touch each node on the path to the root from the inserted/deleted node).

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/52539

3.2K people like this

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback