Here is the description of the data structure I am looking for:
Initial Original Data
Index | Frequency | 1 3 2 1 3 7 4 2 5 6
Now it should be kept Sort-ed by their frequency inside the data structure like this,
Index | Frequency | Cumulative Frequency | 2 1 1 4 2 3 1 3 6 5 6 12 3 7 19
Now, if I Insert a new entry item (index 6) with frequency of 2, the data structure should maintain it like this:
Index | Frequency | Cumulative Frequency | 2 1 1 4 2 3 ->6 2 5 1 3 8 5 6 14 3 7 21
Now, a frequency increase of 3 at Index 2 should Update the structure as following:
Index | Frequency | Cumulative Frequency | |2 1 1| 4 2 2 | 6 2 4 | 1 3 7 |-> 2 1+3 = 4 11 5 6 17 3 7 24
You can see above that the frequencies can repeat in many indices. (see index 4 and 6 above).
Now my Query to find the index that contains a cumulative frequency should be resulted in the following way
Query(cumulative frequency) | Result (Index at the data structure) 3 6 1 4 20 3 7 1
I have already explored Fenwick Tree (Binary Indexed Tree). It has a very efficient, O(log(n)), way to insert and update frequency at an index and to find the index for a specfic cumulative frequency. But I did not find a way to keep the Indexes sorted by their frequency.
Other structures like Treap, Red-Black tree can store the frequency in a sorted way but I could not find a way to search the index with the specific cumulative frequency using those data structures.
Asked By : Syed Arefinul Haque
Answered By : D.W.
This can be easily handled using a standard balanced tree data structure, augmented with a Fenwick tree and a hashmap:
Store the items in a binary tree. Each leaf stores information about a single item. The leaves should be stored in increasing frequency, i.e., sorted by frequency, from least frequent to most frequent.
To help compute the cumulative frequency, augment the binary tree to store some additional information in each node. In each node, store the sum of the frequencies of all of the items underneath it (i.e., of all of the leaves under that node). These values can be easily updated as the tree changes. This is basically the idea behind Fenwick trees for computing prefix sums.
Also, add a hashmap that maps from the value (what you call "index") to the associated leaf where that index is found in the binary tree.
Now you can support all of the operations you mentioned efficiently. If you use a standard balanced binary tree data structure, the running time for all operations is $O(\log n)$:
Sort: All of the items are already stored in sorted order, sorted by increasing frequency, so this happens automatically.
Insert: You can insert a new item into the tree in $O(\log n)$ time.
Update: You can increase the frequency of a particular item by deleting it from the tree, increasing its frequency, and inserting it into the new location. In a balanced binary tree, this takes $O(\log n)$ time, since deletion can be done in $O(\log n)$ time, as can insertion.
Query: You can now find the first index $i$ where the sum of the frequencies of the first $i$ leaves is at least $t$, by using binary search. Note that you can efficiently compute prefix sums of the frequencies by using the extra values stored in each node of the tree. Therefore, binary search suffices, and can be done in $O(\log n)$ time.
There are many balanced binary tree data structures, including red-black trees, AVL trees, 2-3 trees, etc. They all have the property that standard operations are guaranteed to take at most $O(\log n)$ time. You can use any of them for your purposes.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/30652
0 comments:
Post a Comment
Let us know your responses and feedback