## Wednesday, February 13, 2008

### [Python]Simple Impl. of Bloom Filter

Bloom filter is a data structure to tell whether an object is in given list or not. Of cource, if you hold the given list you can do it. The merit of bloom filter is it doesn't need to hold the original list. As a result you can save a memory space. However, it lost an accuracy of result as the price of space-efficiency. It is possible to answer 'yes' while a query is not in the list in a probability. Because you can estimate the probability, bloom filter remains useful.

The following source is a simple implementation of a bloom filter to learn algorithm.

SIZE = 1987
def hashes(s):
xs = [0, 0, 0]
for c in s:
o = ord(c)
xs[0] = xs[0] * 137 + o
xs[1] = xs[1] * 69 + o
xs[2] = xs[2] * 545 + o

return [x % SIZE for x in xs]

class BloomFilter(object):
def __init__(self):
self.bitarray = [0] * SIZE

for x in hashes(s):
self.bitarray[x] = 1

def query(self, s):
return all(
self.bitarray[x] == 1
for x in hashes(s))

The function 'hashes' calculates three hash value using three different hash function. Class 'BloomFilter' has a bit-array. For simple implementation, I use a list as a substitute for a bit array. It decleases the bloom filter's merit much so you shouldn't use the code in practical use.

The following list is the result of interactive execution on a python's shell.

>>> bf = BloomFilter()
>>> bf.query("hoge")
True
>>> bf.query("hoga")
False