World's most popular travel blog for travel bloggers.

[Solved]: Good snapshottable data structure for an in-memory index

, , No Comments
Problem Detail: 

I'm designing an in-memory object database for a very specific use case. It is single writer, but must support efficient concurrent reads. Reads must be isolated. There is no query language, the database only supports:

  • get object/-s by attribute/set of attributes (there may be support for expressions, e.g. x.count < 5)
  • get object's attribute

A query is an imperative script composed of an arbitrary number of the above operations. The data size will be << memory, so all the object and indices on most of the attributes should fit comfortably without swapping.

What I need is a data structure for the object's attribute index, which can be O(n) on writes, not support write concurrency, but should support ideally O(1) snapshots (maybe copy on write) and O(logN) access. Ideally it would allow high concurrency on reads with maximal structural sharing between the versions.

I was looking at CTries, Concurrent BSTs and Concurrent Splay Trees but I'm not sure if I'm really looking in the right direction here. The above structures pay a lot of attention to the complexity of inserts which I don't care about.

The question: is there a known data structure that is a good fit for my use case out of the box?

EDIT: after thinking some more it seems that a persistent BST/Splay tree would work. The writer would update the 'master' copy and the queries would get the tree as of the start of execution and throw it away after they are done. However, I'm still interested if there's a better solution.

Asked By : dm3

Answered By : Wandering Logic

Use any kind of persistent/immutable (i.e., functional) tree-based data structure. The key is getting the locking right, as @Raphael pointed out in the comments.

The nice thing about functional/persistent tree-based data structures, is that you get "snapshots" for free. Let's suppose you use a treap (randomized binary search tree) for your data structure. Here's an example of one written in Go: https://github.com/steveyen/gtreap. The author describes it thusly:

By immutable, any updates/deletes to a treap will return a new treap which can share internal nodes with the previous treap. All nodes in this implementation are read-only after their creation. This allows concurrent readers to operate safely with concurrent writers as modifications only create new data structures and never modify existing data structures. This is a simple approach to achieving MVCC or multi-version concurrency control.

At any point in time the "current" state of the tree is represented by a pointer to the root of the tree. Insertions are non-mutating. Instead an insertion keeps the previous version of the tree completely intact, creates $O(\log n)$ new nodes for the path from the root to the correct insertion point, with pointers back to the nodes of the previous version that can be shared.

You use a lock to protect the pointer to the root. Since the data structure is immutable reads can be done concurrently, and you can save pointers to old snapshots. A read is:

lock tmp = ptr_to_root unlock value = search(tmp, <value to search for>) return value 

Even though the search may take a while, you only hold the lock while copying the pointer, so the searches can happen concurrently.

A write is:

lock old_ptr_to_root = ptr_to_root ptr_to_root = insert(old_ptr_to_root, <new key/value pair>) unlock 

In this version, writes do need to hold the lock during the entire process of creating the new version of the tree. You can improve read performance (at the cost of sometimes having the write transaction fail) by changing the write to something like this:

top:   lock   old_ptr_to_root = ptr_to_root   unlock   new_ptr_to_root = insert(old_ptr_to_root, <new key/value pair>)   lock   if (ptr_to_root == old_ptr_to_root)   # make sure no other write happened in the interim     ptr_to_root = new_ptr_to_root     unlock   else                                  # transaction fails, try again     unlock     goto top 

You may be able to do even slightly better (make it "lock free") if your programming language has atomic variables with an atomic compare-and-swap operation. (For example by using C++11's atomic<T*>.)

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/44349

 Ask a Question

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback