The Count-Min Sketch is an awesome data structure for estimating the frequencies of different elements in a data stream. Intuitively, it works by picking a variety of hash functions, hashing each element with those hash functions, and incrementing the frequencies of various slots in various tables. To estimate the frequency of an element, the Count-Min sketch applies the hash functions to those elements and takes the minimum value out of all the slots that are hashed to.
The original paper on the Count-Min Sketch mentions that the data structure requires pairwise independent hash functions in order to get the necessary guarantees on its expected performance. However, looking over the structure, I don't see why pairwise independence is necessary. Intuitively, I would think that all that would be required would be that the hash function be a universal hash function, since universal hash functions are hash functions with low probabilities of collisions. The analysis of the collision probabilities in the Count-Min Sketch looks remarkably similar to the analysis of collision probabilities in a chained hash table (which only requires a family of universal hash functions, not pairwise independent hash functions), and I can't spot the difference in the analyses.
Why is it necessary for the hash functions in the Count-Min Sketch to be pairwise independent?
Thanks!
Asked By : templatetypedef
Answered By : Sasho Nikolov
You are right: universal hashing suffices. Pairwise independence, while stronger, is the usual method to construct a universal hash family. Also pairwise independence is contrasted in the paper with the 4-wise independence required by previous methods, such as the AMS sketch.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/7275
0 comments:
Post a Comment
Let us know your responses and feedback