NISHIO Hirokazu's blog: February 2008

Tuesday, February 26, 2008

Why I cancelled today's meeting

Why did I cancel today's meeting (Asiajin Meeting Tokyo #1) ?

I'm sorry. I found it was double-booked with another paid event. The event is from 26 to 28 I thought.

And this morning I found it was wrong. The event is from 27 to 29. I should go to my company to change my application of leisure. My coworker, amachang, had suffering large stress from making ready a presentaiton in English. I feel very sorry.

I'll make sure I will speak on the second meeting. I promise.

Friday, February 15, 2008

[Python]Draw a fractal pattern(Julia set)

I wrote a program to draw a fractal pattern.
Julia set - Wikipedia, the free encyclopedia.
I found it is very easy to implement; it took much time to check the mathematical definition of Julia set than to implement. If you already know Python and don't know Python Imaging Library (PIL) yet, I strongly recommend to learn it!

Using PIL you can generate images like above from very simple code:


import Image
import ImageDraw

SIZE = 128
image = Image.new("L", (SIZE, SIZE))
d = ImageDraw.Draw(image)

c = 0.5 + 0.2j
for x in range(SIZE):
    for y in range(SIZE):
        re = (x * 2.0 / SIZE) - 1.0
        im = (y * 2.0 / SIZE) - 1.0

        z=re+im*1j
        for i in range(128):
            if abs(z) > 2.0: break
            z = z * z + c
        d.point((x, y), i * 2)

image.save(r"c:\julia.png", "PNG")

Here is a movie(2MB, c = -0.78 + i * 0.002 + 0.23j (i < 40), center = -0.5 + 0.2j, scale = 1.0). I think this type of movie isn't suitable for MPEG compression, it is raw avi.

Thursday, February 14, 2008

[Python]Eliminate assignment before conditional statement

Python doesn't allow to write a statement in a condition clause.
The limitation keeps beginners away from the well-known bug;
using an assignment while thay want to evaluate equality of two values.
However, it is a little pain to be forced an assignment before a conditional statement for me as below.


import re

data = "aaaabbbbaaaa"

m = re.search("b+", data)
if m:
    print "'b+' is found at", m.start()

One possible solution is introducing a stack.
By using my 'bigstack' library you can write as below.


import re
import bigstack

data = "aaaabbbbaaaa"

if push(re.search("b+", data)):
    print "'b+' is found at", pop().start()

The library introduces two function 'push' and 'pop' to the built-in namespace. You may easily imagine how it works. And the temporary variable 'm' is no longer needed.

The source code of the 'bigstack' library is as below. It is very simple.
I doesn't think it is the perfect solution, but it could make your code more pretty especially in a 'while' statement.


"""bigstack.py: singleton stack to eliminate temporary variables"""

import __builtin__

BIG_STACK = []

def push(x):
    BIG_STACK.append(x)
    return x

def pop():
    return BIG_STACK.pop()

__builtin__.__dict__.update(
    push=push,
    pop=pop
)

-----
p.s.
I'm sorry. It requires extra "else" clause to pop the value. worthless...

Wednesday, February 13, 2008

[Python]Simple Impl. of Bloom Filter

Bloom filter is a data structure to tell whether an object is in given list or not. Of cource, if you hold the given list you can do it. The merit of bloom filter is it doesn't need to hold the original list. As a result you can save a memory space. However, it lost an accuracy of result as the price of space-efficiency. It is possible to answer 'yes' while a query is not in the list in a probability. Because you can estimate the probability, bloom filter remains useful.

The following source is a simple implementation of a bloom filter to learn algorithm.


SIZE = 1987
def hashes(s):
    xs = [0, 0, 0]
    for c in s:
        o = ord(c)
        xs[0] = xs[0] * 137 + o
        xs[1] = xs[1] * 69 + o
        xs[2] = xs[2] * 545 + o

    return [x % SIZE for x in xs]

class BloomFilter(object):
    def __init__(self):
        self.bitarray = [0] * SIZE

    def add(self, s):
        for x in hashes(s):
            self.bitarray[x] = 1

    def query(self, s):
        return all(
            self.bitarray[x] == 1
            for x in hashes(s))

The function 'hashes' calculates three hash value using three different hash function. Class 'BloomFilter' has a bit-array. For simple implementation, I use a list as a substitute for a bit array. It decleases the bloom filter's merit much so you shouldn't use the code in practical use.

The following list is the result of interactive execution on a python's shell.


>>> bf = BloomFilter()
>>> bf.add("hoge")
>>> bf.query("hoge")
True
>>> bf.query("hoga")
False
>>> bf.add("foo")
>>> bf.add("bar")
>>> bf.add("baz")
>>> bf.query("hoga")
False
>>> bf.query("foo")
True

When a string added, the bloom filter write '1' on the position of bit array corresponds to each hash value. When a string queried, the bloom filter checks each position and if and only if all of them are '1' it says 'True'. You can imagine more strings are added, the bit array has more '1', and bloom filter more possibly says 'True'. The probability can easily estimate. Given m is the size of bit array, k is the number of hash functions and n is the number of strings you added, the probability is (1 - exp(-float(k * n) / m)) ** k. In this case, m equals 1987 and k equals 3, so when n is 100 the prob. is 0.003 and when n is 1000 the prob. is 0.47. You have to increase the size of bit array if you want to add 1000 strings to a bloom filter.

I'll speak on Asiajin Meeting Tokyo #1

On 2/26(Tue), an English meeting will be held in Tokyo.

Asiajin Meeting Tokyo #1 on 26th(Tue), Feb. | Asiajin

I'll attend and speak on my web-service 'doukaku.org'. It is a community site for programmers. I'm nervous whether I can make myself understood, but to challenge is better than to hesitate, I think.

After the meeting, I'll upload my slide and wrote an entry about it.

NISHIO Hirokazu's blog