World's most popular travel blog for travel bloggers.

[Solved]: Saving on array initialization

, , No Comments
Problem Detail: 

I recently read that it is possible to have arrays which need not be initialized, i.e. it is possible to use them without having to spend any time trying to set each member to the default value. i.e. you can start using the array as if it has been initialized by the default value without having to initialize it. (Sorry, I don't remember where I read this).

For example as to why that can be surprising:

Say you are trying to model a worst case $\mathcal{O}(1)$ hashtable (for each of insert/delete/search) of integers in the range $[1, n^2]$.

You can allocate an array of size $n^2$ bits and use individual bits to represent the existence of an integer in the hashtable. Note: allocating memory is considered $\mathcal{O}(1)$ time.

Now, if you did not have to initialize this array at all, any sequence of say $n$ operations on this hashtable is now worst case $\mathcal{O}(n)$.

So in effect, you have a "perfect" hash implementation, which for a sequence of $n$ operations uses $\Theta(n^2)$ space, but runs in $\mathcal{O}(n)$ time!

Normally one would expect your runtime to be at least as bad as your space usage!

Note: The example above might be used for an implementation of a sparse set or sparse matrix, so it is not only of theoretical interest, I suppose.

So the question is:

How is it possible to have an array like data-structure which allows us to skip the initialization step?

Asked By : Aryabhata

Answered By : zotachidil

This is a very general trick, which can be used for other purposes than hashing. Below I give an implementation (in pseudo-code).

Let three uninitialized vectors $A$, $P$ and $V$ of size $n$ each. We will use these to do the operations requested by our data structure. We also maintain a variable $pos$. The operations are implemented as following:

init:   pos <- 0  set(i,x): if not(V[i] < pos and P[V[i]] = i)    V[i] <- pos, P[pos] <- i, pos <- pos + 1 A[i] <- x  get(i): if (V[i] < pos and P[V[i]] = i)    return A[i]  else    return empty  

The array $A$ simply stores the values that are passed through the $set$ procedure. The arrays $V$ and $P$ work as certificates that can tell if a given position in $A$ has been initialized.

Note that at every moment the elements in $P$ ranging from $0$ to $pos-1$ are initialized. We can therefore safely use these values as a certificate for the initialized values in $A$. For every position $i$ in $A$ that is initialized, there is a corresponding element in the vector $P$ whose value is equal to $i$. This is pointed by $V[i]$. Therefore, if we look at the corresponding element, $P[V[i]]$ and its value is $i$, we know that $A[i]$ has been initialized (since $P$ never lies, because all the elements that we're considering are initialized). Similarly, if $A[i]$ is not initialized, then $V[i]$ may point either to a position in $P$ outside the range $0..pos-1$, when we know for sure that it's not initialized, or may point to a position within that range. But this particular $P[j]$ corresponds to a different position in $A$, and therefore $P[j] \neq i$, so we know that $A[i]$ has not been initialized.

It's easy to see that all these operations are done in constant time. Also, the space used is $O(n)$ for each of the vectors, and $O(1)$ for the variable $pos$, therefore $O(n)$ in total.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/492

0 comments:

Post a Comment

Let us know your responses and feedback