World's most popular travel blog for travel bloggers.

# Using B trees to search inside dynamically changing array segments

, ,
Problem Detail:

Let $A$ be a sorted array. Suppose that this array is huge and can only be stored in a disk. In the I/O model and algorithms built in this model, every time we access the disk we spend an I/O and we get $B$ elements to the RAM. Since accessing the disk is much more expensive than accessing the main memory, we assume that what ever operations we perform in main memory, they come for free. So even if you scan 1 million times the entire RAM, in the I/O model, this would cost $O(1)$ I/Os, and that's the only thing we care about.

Suppose that you now receive queries in the form $[l,r, key]$ where $l$ and $r$ are indices which define a continguous chunk of the array $A$, and $key$ is a value that you want to search inside this chunk. So you want to know whether the key is somewhere in that chunk or not.

If you expect many queries to come, and we will make this assumption as well, it might be worth spending $O(\frac{N}{B})$ I/Os to build a B tree on top of the array $A$, and use the $B$ tree for your searches. Since the height of a B tree is $O(\log_{B}(N))$ each search will require $O(\log_{B}(N))$ I/Os.

However I was wondering, if the size of the chunk defined by the query is $n$, is it possible to use the same B tree to achieve $O(\log_{B}(n))$ I/Os? Because if we always start searching from the root of the B tree, we would first have to find where our chunk is, and then search inside the chunk, which is why the total number of I/Os would be $O(\log_{B}(N))$.

My idea would be to spend 1 I/O to access the leaf block in the B tree containing element $A[l]$, 1 I/O to access the leaf block in the B tree containing element $A[r]$, and then spend $O(\log_{B}(n))$ I/Os to find the least common ancestor block in the B tree of these two leaf blocks. Then since we found the least common ancestor block in the B tree, we can start searching from there, spending in total $2 + O(\log_{B}(n)) + \log_{B}(n)$ I/Os.

Unfortunately I am very novice in the I/O model and this approach sounds too good to be correct. Is it really that easy for this problem or am I missing something?

###### Answered By : NP-hard

I assume the "B-tree" in your post is in fact a "B+-tree" and array $\small A$ is stored in an increasing manner. Your idea is almost good and needs some small revision.

• You need to test whether $\small A[l] \leq key$ and $\small A[r] \geq key$. If $\small A[l] = key$ or $\small A[r] = key$, it is done.

• If it is found that $\small A[l] < key$ and $\small A[r] > key$, find the LCA of the blocks containing $\small A[l]$ and $\small A[r]$, as you have done. The height of the LCA is of $\small \mathcal{O}(\log_B(n))$.

• If $\small A[l] > key$ or $\small A[r] < key$, then obviously $\small key$ does not exist in $\small [l, r]$.

Why testing $\small A[l] \leq key$ and $\small A[r] \geq key$ is important?

The LCA block may contain entries that direct to blocks outside $\small [l, r]$. If you start from the LCA, you may find a $\small key$ even if $\small key$ does not exist in $\small [l, r]$.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/64296

3200 people like this