Description
Say I have a source list like: ["a","b","c","d"]
and want to run a capitalization filter so if I change, "b"
to "B"
, my derived list looks like ["B"]
. Next, if I change my source list's "d"
to "D"
, the derived list should look like ["B","D"]
.
Is it possible to make the insertions and removals in the derived list, while maintaining order, in faster than $O(n)$ time, $n$ being the size of the list?
Finally, I'd like to be able to know the index of where the insertion or removal takes place in the derived list in faster than $O(n)$ time.
An idea that doesn't work
I could keep the derived list in a tree, sorted by the source index. This way, when I needed to insert my item in derived tree-list, it would take $O(log n)$. However, I would not be able to know that index of that insertion without walking the entire tree.
Thanks for your help!
Problem Statement
- Let $S$ be an ordered list
- Let $Si.filterProperty$ be $true$ for some elements in $S$
- Let $F$ be an ordered list of all elements in $S$ where $Si.filterProperty$ is true
Let $T(i)$ be a function that:
- Sets $Si.filterProperty$ to $true$
- Inserts $Si$ into $F$, ordered in the same fashion as $S$
- Returns the index in $F$ where $Si$ was inserted
Can $T$ and its side effects be faster than $O(n)$?
Any list-like data structures can be used within these constraints:
- Insertion and side effects of insertion are faster than $O(n)$
- We must know the index of the newly inserted item into $F$
Asked By : Justin Meyer
Answered By : babou
At the time of writing I was not absolutely sure what the problem was. This lead to a more general second answer. See details in discussion at the end of this answer.
Apparently, the "idea that does not work" does not work because you do not know the index of an element $S_i$ in $F$ organized as a tree.
This is what is addressed here. The problem is restated for clarity, as your statement of it is a bit hard to follow.
Restatement of the problem:
Given a set $S$ with an order relation, maintain an ordered list $F$ where elements of $S$ can be inserted or removed, and such that the index of an element in $F$ is returned when the element is inserted.
Actually, the solution will do more, as it is possible to access any element of $F$ given its index, or to retrieve the index given an element of $F$. All operations are in time $\log n$.
Solution:
You keep the ordered derived list $F$ in a self-balancing binary tree. I checked with AVL trees, but other types (such as red-black trees) should work as well. In each node $N$ of the tree you keep a count $\text{weight}(N)$ of the number of elements of $F$ in the subtree rooted in $N$, node $N$ included.
This can be done by updating weight counts as you walk down from the root of the AVL tree for $F$ to reach the place for the insertion or deletion. Then, during the retracing phase that rebalances the tree, when needed, you also update the weight counts according to the rebalancing. That only add constant cost to each elementary step, thus does not change the $\log n$ time complexity of the algorithm.
Hence it is easy to use this weight to compute in $\log n$ time the index of an element of the list $F$ when accessing it, either on the way down from the AVL tree root, or by walking up to the root when accessing the element directly.
The index is computed on the fly by adding the weights of all the immediate siblings on the left of the path from the AVL tree root to the concerned element of $F$.
Conversely, it is easy to find in $\log n$ time an element of $F$ from its (left to right) index by walking down from the root and leaving enough elements on the left.
Computing the index of $S_i$ in $F$ using weights:
Recall that the weight of a node $N$ is the number of nodes in the subtree rooted at $N$, node $N$ included.
Computing the index of $S_i$ from the weights in the tree is precisely done as follows: you simply do a search for $S_i$ in the standard way of binary search trees, with a modification to compute the index. Below is a pseudo-code recursive version, adapted from the searching algorithm in wikipedia. It raises an exception if $S_i$ is not in the AVL tree, and the index count start at 0 for the first element. To have the count start at 1, the first return statement should be return node.left.weight+1
.
function Find-recursive(S_i, node): // call initially with node = root if node = Null then raise Key_Not_Found else if node.key = S_i then if node.left = Null then return 0 else return node.left.weight else if key < node.key then return Find-recursive(key, node.left) else if node.left = Null then return Find-recursive(key, node.right)+1 else return Find-recursive(key, node.right)+node.left.weight+1
When the path from the root goes to the right daughter, that means that all nodes in the left daugther subtree, plus the current node, precede $S_i$, and have to be added to whatever other predecessors will be found while searching in the right daughter subtree.
I do not give here the modifications of the insertion and deletion code necessary to update the weight count, as this is a bit too long for an answer here, and not so hard to do.
About the question, and the two proposed answers
The precise specification of the problem took some time to agree upon.
This first answer to the question was written on the basis of an initial specification that did not mention explicitly that the source list $S$ could be modified dynamically, by insertion or deletion, thus possibly inducing a similar modification in the filtered list $F$.
The elements of list $S$ are mutable objects and have a filter-property $P_F$ that may be true or false depending on operations performed on an element. The purpose of the question is to maintain a sublist $F$ of elements $s\in S$ such that $P_F(s)$, similarly ordered, such that the rank in $F$ of an element $s\in S$ is known when it in inserted in $F$ due to a mutation.
I assume in this first answer here that the list $S$ is constant, except for changes to its mutable elements. Thus on may see the list $S$ as a set (of list elements), totally ordered by the order of the list, and that relative order of two elements can be checked in constant time $O(1)$, for example by attaching its $S$-list rank to each element.
In a much later comment to this answer, the author of the question made it clear that he also wanted to be able to modify the list $S$ with insertion or deletion of elements. This more complex question is addressed separately in a second answer.
Why keep both answers? The first algorithm here is slightly more efficient as all operations can be performed in time $O(\log|F|)$, while the more general algorithm (second answer) requires time $O(\log|S|)$. With a constant ratio $|F|/|S|$, the difference is only an additive constant $\log|S|-\log|F|$. However this constant is to be compared to a logarithmic complexity $\log|F|$, which may not be very large itself. And this complexity is that of elementary operations on a data structure, to be repeated often in larger algorithms. So this apparently negligible difference may have a measurable impact on performance of a larger algorithm using the structure.
To see it numerically, assume that $|S|=256$ and $|F|=16$. Then $\log|S|=8$ and $\log|F|=4$, which give a performance ratio of $2$, to which one should add that the structure of the more general algorithm may be more costly to maintain.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/43447
0 comments:
Post a Comment
Let us know your responses and feedback