World's most popular travel blog for travel bloggers.

[Solved]: Why do Bloom filters work?

, , No Comments
Problem Detail: 

Let's say I am using Bloom filters to create a function to check if a word exists in a document or not. If I pick a hash function to fill out a bit bucket for all words in my document. Then if for a given number of words, wouldn't the whole bit bucket be all 1s? If so then checking for any word will return true? What am I missing here?

Asked By : user220201

Answered By : Wandering Logic

Given that you want to insert $n$ words into the Bloom filter, and you want a false positive probability of $p$, the wikipedia page on Bloom filters gives the following formulas for choosing $m$, the number of bits in your table and $k$, the number of hash functions that you are going to use. They give $ m = - \frac{n \ln p}{(\ln 2)^2} $ and $$ k = \frac{m}{n}\ln 2=-\frac{\ln p}{\ln 2}=-\lg_2p, $$ so you should choose $$ m=\frac{nk}{\ln 2}. $$

That actually works out quite nicely. You are going to get a table with about half the bits set and half cleared, so the entropy per bit is going to be maximal, and the probability of a false positive is going to be $0.5^k$.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/12834

0 comments:

Post a Comment

Let us know your responses and feedback