I'm looking for an online algorithm that takes a stream of elements and preserves the elements that are on the Pareto frontier (e.g. all non-dominated elements).
For instance. Given the following inputs, the retained Pareto frontier set would evolved as follows:
(3,7)
- insert element b/c it's the first element
- pareto set now includes
{(3,7)}
(7,3)
- insert element b/c it's not dominated in the first
- pareto set now includes
{(3,7), (7,3)}
(8,4)
- insert element b/c it's not dominated; remove
(7,3)
which it is dominated in both dimensions - pareto set now includes
{(3,7), (8,4)}
- insert element b/c it's not dominated; remove
(1,1)
- do not insert because it's dominate in both dimensions
- pareto set now includes
{(3,7), (8,4)}
(9,9)
- insert element b/c it's not dominated; remove all other elements because this dominates them in both dimensions
- pareto set now includes
{(9,9)}
In my example I'm using 2-tuples, but I'm looking for an algorithm that could handle N-tuples for "small" N (say <10).
The naive solution is to just to compare each element with all elements currently in the set. In practice the naive approach might not be so bad (e.g. sub $O(n^2)$) because elements will regularly be expelled by the comparison set. But I was wondering if there was a known efficient algorithm for this. I'm interested in efficiency in memory and in computational complexity. (Ha! And as a matter of fact, I'm looking for the set of algorithms that are Pareto optimal with respect to memory and computational complexity.)
My current application of this is in building a Lucene search-document Collector
that doesn't collect the most relevant documents (the typical use case for a search engine), but collects the Pareto optimal documents along specified dimensions.
Asked By : JnBrymn
Answered By : D.W.
In two dimensions, each update can be done in $O(\lg n)$ time, by using a balanced binary tree data structure. But when you are working in a high-dimensional space, I don't know of any efficient solution.
Let me describe an efficient algorithm for the 2D case. Let $F$ denote the set of points in the Pareto frontier. Store $F$ in a balanced binary tree, using the $x$-coordinate of each point as its key. Note that when you sort $F$ by increasing $x$-coordinate, they'll also be sorted by decreasing $y$-coordinate.
Now, given a new point $(x_q,y_q)$, you can check efficiently whether it is Pareto-dominated by any element of $F$. Find the first element of $F$ to the right of $(x_q,y_q)$ (i.e., the element $(x,y) \in F$ such that $x \ge x_q$ and $x$ is minimal); then checking whether it dominates $(x_q,y_q)$.
Also, given a new point $(x_q,y_q)$, you can efficiently find whether it Pareto-dominates any element of $F$. In particular, you can find indices $i,j$ such that the points $(x_i,y_i),(x_{i+1},y_{i+1}),\dots,(x_j,y_j)$ of $F$ are all Pareto-dominated by $(x_q,y_q)$ (assuming that the points of $F$ have been ordered by $x$-coordinate, Pareto-dominated points will be in a consecutive interval). Here's how. Find the first element of $F$ to the left of $(x_q,y_q)$ (i.e., the element $(x_j,y_j) \in F$ such that $x_j \le x_q$ and $x_j$ is as large as possible), and check whether $(x_q,y_q)$ dominates it. If yes, find the smallest index $i$ such that $i<j$ (so $x_i<x_j$) and $y_i \le y_q$. Both of these steps can be done in $O(\lg n)$ time. (Finding $i$ can be done in $O(\lg n)$ time by treating the tree as branching on the $y$-coordinate of points, and taking advantage of the fact that the points of $F$ are sorted by decreasing $y$-coordinate.)
Now this tell us what to do. If $(x_q,y_q)$ is dominated by some point of $F$, then don't add it to $F$; you're done. Alternatively, if $(x_q,y_q)$ dominate points $i..j$ of $F$, then you need to delete those points from $F$ and add $(x_q,y_q)$ into $F$. This can be done in $O(\lg n)$ time, by noting that any interval of consecutive indices can be expressed as the union of $O(\lg n)$ subtrees of the binary tree (roughly speaking, you work with the siblings of the nodes along the path from $i$ to the root, and the same for the path from $j$ to the root); you can delete each subtree in $O(1)$ time. This lets us delete an entire range of consecutive points in $F$ in $O(\lg n)$ time, no matter how large the range is. For details, see Delete a consecutive range of leaves from a binary tree.
All of this can be done in $O(\lg n)$ time, using a balanced binary tree data structure.
This works in 2 dimensions (i.e., 2-tuples). In higher dimensions, the problem gets much harder. You can find references to the literature, with techniques for higher dimensions, at How to find a subset of potentially maximal vectors (of numbers) in a set of vectors; but I'm afraid that in high dimensions, all the known algorithms are likely to be fairly slow (they have a factor that is something like $O((\lg n)^{d-1})$ where $d$ is the number of dimensions).
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/52201
0 comments:
Post a Comment
Let us know your responses and feedback