World's most popular travel blog for travel bloggers.

[Solved]: An online algorithm to find the Pareto frontier elements

, , No Comments
Problem Detail: 

I'm looking for an online algorithm that takes a stream of elements and preserves the elements that are on the Pareto frontier (e.g. all non-dominated elements).

For instance. Given the following inputs, the retained Pareto frontier set would evolved as follows:

  • (3,7)
    • insert element b/c it's the first element
    • pareto set now includes {(3,7)}
  • (7,3)
    • insert element b/c it's not dominated in the first
    • pareto set now includes {(3,7), (7,3)}
  • (8,4)
    • insert element b/c it's not dominated; remove (7,3) which it is dominated in both dimensions
    • pareto set now includes {(3,7), (8,4)}
  • (1,1)
    • do not insert because it's dominate in both dimensions
    • pareto set now includes {(3,7), (8,4)}
  • (9,9)
    • insert element b/c it's not dominated; remove all other elements because this dominates them in both dimensions
    • pareto set now includes {(9,9)}

In my example I'm using 2-tuples, but I'm looking for an algorithm that could handle N-tuples for "small" N (say <10).

The naive solution is to just to compare each element with all elements currently in the set. In practice the naive approach might not be so bad (e.g. sub $O(n^2)$) because elements will regularly be expelled by the comparison set. But I was wondering if there was a known efficient algorithm for this. I'm interested in efficiency in memory and in computational complexity. (Ha! And as a matter of fact, I'm looking for the set of algorithms that are Pareto optimal with respect to memory and computational complexity.)

My current application of this is in building a Lucene search-document Collector that doesn't collect the most relevant documents (the typical use case for a search engine), but collects the Pareto optimal documents along specified dimensions.

Asked By : JnBrymn

Answered By : D.W.

In two dimensions, each update can be done in $O(\lg n)$ time, by using a balanced binary tree data structure. But when you are working in a high-dimensional space, I don't know of any efficient solution.

Let me describe an efficient algorithm for the 2D case. Let $F$ denote the set of points in the Pareto frontier. Store $F$ in a balanced binary tree, using the $x$-coordinate of each point as its key. Note that when you sort $F$ by increasing $x$-coordinate, they'll also be sorted by decreasing $y$-coordinate.

Now, given a new point $(x_q,y_q)$, you can check efficiently whether it is Pareto-dominated by any element of $F$. Find the first element of $F$ to the right of $(x_q,y_q)$ (i.e., the element $(x,y) \in F$ such that $x \ge x_q$ and $x$ is minimal); then checking whether it dominates $(x_q,y_q)$.

Also, given a new point $(x_q,y_q)$, you can efficiently find whether it Pareto-dominates any element of $F$. In particular, you can find indices $i,j$ such that the points $(x_i,y_i),(x_{i+1},y_{i+1}),\dots,(x_j,y_j)$ of $F$ are all Pareto-dominated by $(x_q,y_q)$ (assuming that the points of $F$ have been ordered by $x$-coordinate, Pareto-dominated points will be in a consecutive interval). Here's how. Find the first element of $F$ to the left of $(x_q,y_q)$ (i.e., the element $(x_j,y_j) \in F$ such that $x_j \le x_q$ and $x_j$ is as large as possible), and check whether $(x_q,y_q)$ dominates it. If yes, find the smallest index $i$ such that $i<j$ (so $x_i<x_j$) and $y_i \le y_q$. Both of these steps can be done in $O(\lg n)$ time. (Finding $i$ can be done in $O(\lg n)$ time by treating the tree as branching on the $y$-coordinate of points, and taking advantage of the fact that the points of $F$ are sorted by decreasing $y$-coordinate.)

Now this tell us what to do. If $(x_q,y_q)$ is dominated by some point of $F$, then don't add it to $F$; you're done. Alternatively, if $(x_q,y_q)$ dominate points $i..j$ of $F$, then you need to delete those points from $F$ and add $(x_q,y_q)$ into $F$. This can be done in $O(\lg n)$ time, by noting that any interval of consecutive indices can be expressed as the union of $O(\lg n)$ subtrees of the binary tree (roughly speaking, you work with the siblings of the nodes along the path from $i$ to the root, and the same for the path from $j$ to the root); you can delete each subtree in $O(1)$ time. This lets us delete an entire range of consecutive points in $F$ in $O(\lg n)$ time, no matter how large the range is. For details, see Delete a consecutive range of leaves from a binary tree.

All of this can be done in $O(\lg n)$ time, using a balanced binary tree data structure.

This works in 2 dimensions (i.e., 2-tuples). In higher dimensions, the problem gets much harder. You can find references to the literature, with techniques for higher dimensions, at How to find a subset of potentially maximal vectors (of numbers) in a set of vectors; but I'm afraid that in high dimensions, all the known algorithms are likely to be fairly slow (they have a factor that is something like $O((\lg n)^{d-1})$ where $d$ is the number of dimensions).

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/52201

3.2K people like this

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback