Bloom filters speedrun
Bloom filters are space-efficient, probabilistic data structures for testing set membership, using multiple hash functions to map elements to bits in a bit array, allowing for false positives but not negatives, and not supporting deletion.
- Prepare a bit array of length M
- Have a N(1) number of hash functions
- Take your set of items you want to check query against
- Hash each item with hash functions and set the corresponding bit to 1
- Now you have your bloom filter
To test if an item is in the set, do following:
- hash item with all hash functions
- check for each hash result if there is a set corresponding bit
- if all bits are 1, the element is probably(2) in
- if a single bit is 0, the element not in the set
There can be false positives with bloom filters.
Bloom filters aren’t hash map replacement. It is efficient way to test presence of items in the set.
Example use-case
Password Strength Checking: Instead of maintaining a large set of all possible weak passwords, a bloom filter can be used to check if a password is likely to be weak. This approach significantly reduces the space required for storing all possible weak passwords.
(1) There is a way to calculate optimal number of hash functions and other parameters
(2) There can be false positives with bloom filters