A Bloom filter is a probabilistic data structure designed to tell, rapidly and memory-efficiently, whether an element is in the set or no.
If we can use hash tables where we have O(1) in best time, O(n) in a very bad situations. Why do we need a new abstract data structure that look-up for an element with less certainty. What are the scenarios where HashTables fails and demands the use of bloom filters?
When should bloom filters be used over hashtables and vice-versa?
Asked By : user3378649
Answered By : David Richerby
The second paragraph of the Wikipedia article on Bloom filters says the following, with a citation to Bloom's original 1970 paper.
Bloom proposed the technique for applications where the amount of source data would require an impracticably large hash area in memory if "conventional" error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, out of which 90% follow simple hyphenation rules, but the remaining 10% require expensive disk accesses to retrieve specific hyphenation patterns. With sufficient core memory, an error-free hash could be used to eliminate all unnecessary disk accesses; on the other hand, with limited core memory, Bloom's technique uses a smaller hash area but still eliminates most unnecessary accesses. For example, a hash area only 15% of the size needed by an ideal error-free hash still eliminates 85% of the disk accesses.
Burton H. Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13(7): 422–426, 1970. (DOI)
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/32264
0 comments:
Post a Comment
Let us know your responses and feedback