World's most popular travel blog for travel bloggers.

[Answers] Hashing a Specific Range Of a Character Array

, , No Comments
Problem Detail: 

I need to process queries to Hash various ranges of a character array. I am currently using the Arrays.hashCode from the standard java library. But the problem is that this method is too slow. Also my array remains the same throughout the process of hashing, I only am changing the range. To deal with this, I have to make an entire copy of the array everytime I process a query, and then compute the hash from the above function.

I am using Arrays.copyOfRange to create a copy everytime I process a query. I need to avoid this. So I was thinking of devising a hashing scheme of my own. This scheme should be such that I whould get a unique hash for each array range. Hashes should be same if all characters in the range are same.

Any Help on how to proceed with the making of such a hash function will be appreciated.

Asked By : Alice

Answered By : D.W.

Use a rolling hash function, such as the Rabin-Karp hash. This allows you to compute hashes of any subrange of $A$ very efficiently, e.g., in $O(1)$ time, assuming you use some extra memory to keep track of all of the intermediate states of the rolling hash.


If $A[0..n-1]$ is your array, a rolling hash defines the hash of the entire array using a recurrence like this:

$$x_i = f(x_{i-1}, A[i])$$

where $x_{-1}$ is some fixed constant and $f$ is some function. Now here's what you can do. You can store the $x$ values in an array $X[0..n-1]$ (where $X[i]=x_i$). This will allow you to compute the hash of any prefix of $A$.

If you want to be able to compute the hash of any consecutive range of $A$, choose the function $f$ carefully to be reversible.


A simple instantiation: let $X[-1] = 0$ (for some constant $c$) and

$$X[i+1] = \alpha \cdot X[i] + A[i] \bmod 2^{32},$$

where $\alpha$ is some 32-bit constant (you'll want it to be odd). Preprocess the array $A$ once to compute the array $X[0..n-1]$.

Now, the hash of $A[i..j-1]$ can be very efficiently computed: it is the following:

$$X[j-1] - \alpha^{j-i} X[i-1] \bmod 2^{32}.$$

Notice that $\alpha^{j-i} \bmod 2^{32}$ can be computed efficiently using fast exponentiation algorithms, i.e., repeated squaring modulo $2^{32}$. Thus, assuming you've stored the array $X$ alongside the array $A$, a hash of any range of $A$ can be done in $O(1)$ time.


If you want something even simpler and easier to implement, set $\alpha$ to 1. Then the hash of $A[i..j-1]$ just becomes $A[i]+ A[i+1]+ \cdots + A[j-1]$, and $X[i]$ stores $A[0] + A[1] + \cdots + A[i]$, so the hash of $A[i..j-1]$ is just $X[j]-X[i]$.
However, the quality of this hash degrades and you might see an increase in hash collisions.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/18873

3.2K people like this

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback