World's most popular travel blog for travel bloggers.

[Solved]: Finding Hash of Substring [i, j] in O(1) using O(|S|) pre computation

, , No Comments
Problem Detail: 

Given a string S of length n characters, is it possible to calculate the Hash of its substring [i, j] (From index i to index j. Inclusive) in O(1) using some form of precomputation ? Maybe a modification of the Rolling Hash ?

Similar Problem

Problem Statement

I have seen it being used in a similar problem where in a string was given in a compressed form. Meaning, e.g. if the string is "aaabccdeeee" then the compressed form is:

3 a 1 b 2 c 1 d 4 e 

How data was stored

They are stored in an str[] array as :

str[] = [{'a','3'}, {'b','1'}, {'c','2'}....] 

HASHING Concept that was used in the solutions

And programmers had used the following hash concept to find if the given substring is a Palindrome or not. Given a substring of string S as (i,j), they computed the hash of substring [i , (i+j)/2] and the reverse hash of substring [(i+j+2)/2, j] and checked if they were equal or not. So if they wanted to check if in string S = "daabac" whether substring [1, 5] is a a palindrome or not, they computed the following :

h1 = forward_hash("aa")  h2 = reverse_hash("ba") h1 == h2 

Code for the Hashing Concept

The hash precomputation was done as follows :

/* Computing the Prefix Hash Table */ pre_hash[0] = 0; for(int i=1;i<=len(str);i++) {     pre_hash[i] = pre_hash[i-1]*very_large_prime + str[i].first;     pre_hash[i] = pre_hash[i]*very_large_prime + str[i].second; }  /* Computing the Suffix Hash Table */ suff_hash[0] = 0; for(int i=1;i<=len(str);i++) {     suff_hash[i] = suff_hash[i-1]*very_large_prime + str[K-i+1].first;     suff_hash[i] = suff_hash[i]*very_large_prime + str[K-i+1].second; } 

And then the hash was computed using the following functions :

/* Calculates the Forward hash of substring [i,j] */ unsigned long long CalculateHash(int i, int j) {     if(i>j)         return -1;     unsigned long long ret = pre_hash[j] - POW(very_large_prime, [2*(j-i+1)])*pre_hash[i-1];     return ret; } /* Calculates the reverse hash of substring [i,j] */ unsigned long long CalculateHash_Reverse(int i, int j) {     unsigned long long ret = suff_hash[j] - POW(very_large_prime,[2*(j-i+1)])*suff_hash[i-1];     return ret; } 

What I am trying to do

I am looking for a general approach to the above concept. Given a Pattern P, I want to check if the pattern P is present in a string S. I know the index (i) to check where it may be present. And I also know the length of pattern P represented as |P|. In short I want to check if hash of S[i, i+|P|] and hash of P match or not in O(1) using some form of pre computation on S.

Ignoring the time taken to compute hash of P else it would be O(1+|P|)

Asked By : Kyuubi

Answered By : D.W.

Yes, you can solve your problem, roughly speaking, by precomputing prefix sums.

In particular, if you are willing to do $O(n)$ precomputation and to use $O(n)$ space, we can solve your problem, for most standard rolling hashes. We just need the rolling hash to have one extra property, the inverse property. Most rolling hashes do have this property, including the well-known Rabin-Karp rolling hash.

Let me explain the details, below.

Some notation: let $S[i..j]$ denote the substring of $S$ from index $i$ to $j-1$ (note: it doesn't include index $j$; it is a substring of length $j-i$). Let $H(S[i..j])$ denote the rolling hash of that substring.

The intuition

Imagine that we used a very simple hash, which is just the sum of the bytes modulo 256:

$$H(S[i..j]) = S[i] + S[i+1] + S[i+2] + \dots + S[j-1] \bmod 256.$$

This is a crummy hash function (it has lots of collisions), but let's go with it, since the purpose is just to get some intuition. This is a rolling hash; given the hash of a prefix of the string, it is easy to extend it by one more byte (just add that byte to the hash value you've got so far).

Now with this hash function, we could easily solve your problem. During the precomputation, we'd fill in an array $A[]$ so that $A[j] = S[0] + S[1] + \dots + S[j-1] \bmod 256$ for each $j$; this can be done in $O(n)$ time. Now, we can compute $H(S[i..j])$ via the simple relation

$$H(S[i..j]) = A[j] - A[i] \bmod 256.$$

This can be computed in $O(1)$ time, given the precomputed array $A$.

Of course, you'd never use this in practice, because it's such a crummy hash function. But the same idea can be generalized to work with many other rolling hash functions, which you would want to use. (Intuitively, that's because they are based upon a binary operation that is associative and has an inverse, which is all you need for this trick to work.) I'll describe the details next.

The inverse property

All rolling hashes have the following property:

The accumulation property: Given $H(S[i..j])$ and $S[j]$, we can compute $H(S[i..j+1])$.

We need one additional property from the rolling hash:

The inverse property: Given $H(S[i..j])$ and $H(S[i..k])$, where $j < k$, it should be possible to derive $H(S[j+1..k])$.

Many standard rolling hashes do have this inverse property. For instance, the Rabin-Karp rolling hash has the inverse property. The same is true for hashing with cyclic polynomials (Buzhash).

How to use the inverse property for your problem

Create an array $A[]$ of $n$ elements. During precomputation, we will fill in the elements of $A$ so that $A[j] = H(S[0..j])$. This precomputation can be done in $O(n)$.

Now, at some later point, you would like to compute the rolling hash of $S[i..j]$, i.e., to compute $H(S[i..j])$. How do you do that? It turns out it is easy. First, look up $H(S[0..i])$ and $H(S[0..j])$ using the array; this involves reading the value of $A[i]$ and $A[j]$. Next, use the inverse property to compute $H(S[i..j])$ from these two values. Done! That's all there is to it.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/28163

0 comments:

Post a Comment

Let us know your responses and feedback