Assume we have a (multi)set of nontrivial intervals $\mathcal{I} = \{I_1,...,I_n\}$ and for any two $I_i, I_j \in \mathcal{I}$, we have that $I_i \cap I_j$ is trivial (that is: contains at most one point), or one of them contains the other. Just to clarify, an example would be $\mathcal{I} = \{[0,10], [2,6],[6,8], [2,4],[2,3],[5,6],[6,7]\}$. For the sake of simplicity, assume there is a longest interval that contains all other intervals, call that the root.
We can arrange such (multi)sets as a tree where $I_i$ is a descendant of $I_j$ iff $I_i \subseteq I_j$ (if an interval has multiplicity $> 1$, then its multiple occurrences can form a path in the tree). What I want to do is find such a tree. It's clearly possible in $\mathcal{O}(n^2)$ time: sort the intervals by length in descending order and process them in that order. Whenever we insert the next interval $I_i$ into the tree, find the shortest interval already in the tree $I_j$ such that $I_i \subseteq I_j$; such an $I_j$ must exist since the root contains all other intervals.
I would like to do this in $\mathcal{O}(n \log n)$ time and it seems to me that should be possible. Rough ideas include using interval trees and trying to construct the tree from the bottom up starting from its leaves, or figuring out how to annotate search trees to do something like this for me, but nothing successful so far.
Asked By : G. Bach
Answered By : D.W.
You can solve it in $O(n \lg n)$ time using a divide-and-conquer algorithm.
Explaining the main idea: a cleaner variant
To help me explain the main idea, I'm going to change the problem slightly. I'll assume we're given a set of intervals, with no interval appearing more than once in the set, and where for any pair of overlapping intervals $I,I'$, one of them contains the other (i.e., either $I \subset I'$ or $I' \subset I$). The goal is to output a forest (a collection of trees).
This variant of the problem is slightly more general, in that we're not assuming we have a single root interval that all others are contained in. It's also slightly more specific in that I'm assuming we don't have the case of a trivial overlap containing a single point.
I'll show later in my answer how to remove these extra assumptions, but for now, let's go with this variant of the problem, as it makes explaining the main ideas easier.
Main idea: an algorithm for the cleaner variant
We'll use divide-and-conquer. Let $x$ denote the median of the $2n$ endpoints of the intervals. This can be computed in $O(n)$ time.
Use $x$ to partition the intervals into three collections:
The intervals that are wholly to the left of $x$ (i.e., their right endpoint is strictly less than $x$).
The intervals that are wholly to the right of $x$ (i.e., their left endpoint is strictly greater than $x$).
The intervals that contain $x$.
We'll first compute three forests, one forest for each of these three collections. Call these forests $\mathcal{L}$ (the forest of the intervals on the left), $\mathcal{R}$ (for the intervals on the right), and $\mathcal{M}$ (for the intervals that contain $x$). You can compute $\mathcal{L}$ and $\mathcal{R}$ by invoking the algorithm recursively. You can compute $\mathcal{M}$ directly by sorting the intervals that contain $x$ by decreasing width and building a path of the intervals by descending size (each node in $\mathcal{M}$ has exactly one child; it is the next-smallest interval).
Now we need to join together the forests $\mathcal{L},\mathcal{R},\mathcal{M}$. Since $\mathcal{M}$ is a path, given an interval $I$, you can find the smallest interval $I' \in \mathcal{M}$ such that $I \subset I'$ by using binary search. Therefore, to join the forests, take the roots of $\mathcal{L}$ and $\mathcal{R}$ and use binary search in $\mathcal{M}$ to find the smallest interval containing it, and add a corresponding child relationship. This is all you need to do to join them into a single forest, as there can be no other child relationships between $\mathcal{L},\mathcal{R},\mathcal{M}$. Output the resulting forest.
What's the running time of this procedure? Well, it takes $O(n)$ time to partition the intervals into these three collections. You can compute $\mathcal{M}$ in $O(n)$ time (if you've done a single up-front precomputation to sort the intervals by decreasing width). It will take at most $O(n \lg n)$ time to join $\mathcal{L}, \mathcal{R}, \mathcal{M}$ into a single forest, since each binary search operation can be done in $O(\lg n)$ time. With a little more cleverness, you can speed up the merging to run in $O(n)$ time (pre-sort the intervals by their left endpoint once, and by the right endpoint; now you can join $\mathcal{L}$ and $\mathcal{M}$ by doing a merge on the sorted list of intervals of $\mathcal{L}$ and the sorted list of intervals of $\mathcal{M}$, both sorted by their left endpoint; join $\mathcal{R}$ and $\mathcal{M}$ similarly). Therefore, the running time is $O(n)$ plus the time for the two recursive calls. Finally, note that each recursive call is on at most $n/2$ intervals, due to the way we chose $x$ (there can be at most $n/2$ intervals wholly to the left of $x$, as each such interval contributes 2 endpoints to the $2n$ endpoints, and at most $n/2$ intervals wholly to the right of $x$).
Therefore, we get the recurrence relation
$$T(n) = 2 T(n/2) + O(n),$$
whose solution is $T(n) = O(n \lg n)$.
Solving your original problem
Your original problem allows a pair of intervals to overlap in a single point, without either one containing the other. This doesn't fundamentally change the problem. You need to generalize the method for computing $\mathcal{M}$ slightly, as it is no longer guaranteed to be a simple path, but this is not hard to do. Also, now $\mathcal{M}$ might have two leaves. If so, for each root of $\mathcal{L},\mathcal{R}$, you need to compare it to both leaves to see whether it is contained in either one. This doesn't change the overall running time.
Also, your original problem allows some intervals to appear in the input twice. This is easy to accommodate: remove duplicates, count how many times each interval appears in the input, run the algorithm above, and then adjust the tree to account for multiplicities (e.g., by duplicating nodes, or however you would like them to appear in the final output).
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/38045
0 comments:
Post a Comment
Let us know your responses and feedback