Tuesday, January 31, 2012

Monte Carlo Simulation in Python

Problem:
Suppose you flip coins for a certain times, say n. You want to know your chance of getting certain number of heads in a row, say k heads in a row.

Let's start with the function that'll do a certain number of coin flips:
Let's do 10 flips right away:
do_n_flips(10)
'0100111001'

Let's suppose you get 1 euro if you get the given number of heads in a row while doing a certain number of flips. Here's the function that'll calculate your payoff:
Let's try our payoff function with doing 10 flips and getting payed for 3 heads in a row. sometimes it is 0, sometimes it is 1:
payoff(10, 3)
0.0
payoff(10, 3)
1.0

Now, what is your chance to get your 1 euro for 3 heads in a row from 10 flips? This brings us to the Monte Carlo simulation, which I'wont describe here at all, just give you a function which answers our question regarding the chance:
Running with a million iteration gives us a good approximation for the chance:
monte_carlo_solve(10, 3, 1000000)
0.507753

This post was inspired by a similar post by Remis.

Wednesday, January 4, 2012

How to use wildcard in Google search

This example demonstrates how to use a wildcard in Google search.

Lets suppose we have this idiom: "if you have a somebody for a friend, you don't need an enemy", but we do not remember exactly what that  somebody would be.

Let's ask Google: "If you have a * for a friend, you don't need an enemy".

And here we go. For the record, at the time being, Google thinks this somebody is either a Hungarian or a politician. I assume, in fact it is a Hungarian politician :)

Friday, December 16, 2011

Twibet: a recursive solution in Ptyhon

Today a friend told me she's struggling with solving the EuroPython 2011 Problem D, called Twibet. I wrote a quick solution and I'm publishing it here for those who are curious about it.
The key to my solution was the recursive traversal (traverse_graph) of a directed graph. The representation of the graph is a dictionary of the monks as keys and a list of the following monks as values.
Because running on the large input file raised a runtime error of  maximum recursion depth exceeded, I used a this little hack: sys.setrecursionlimit(10000).
UPDATE: Here are the solutions of the contestants.

Thursday, August 25, 2011

Concurrent matrix multiplication in Python

We're going to demonstrate parallel matrix multiplication in Python.

Lets suppose we are performing the multiplication: P = A * B. We're going to calculate the result matrix row-by-row. The following function returns a row of a result matrix P. The arguments are a row of matrix A and the matrix B.

def calc_row_of_product_matrix(a_row, b):
    return map(lambda col: sum(starmap(mul,izip(a_row,col))), izip(*b))


We're going to use the multiprocessing module of Python to create a process pool containing as many processes as the number of CPUs we have and calculate each result row in a different process. Here are the two lines that does this trick:

def __mul__(self, b):
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    return pool.map(eval_func_tuple, izip(repeat(calc_row_of_product_matrix), self, repeat(b)))


Some explanation why is this code a bit nasty. Functions passed to the pool has to be picklable and because only the functions defined at the top level of a module are picklable, we needed to get the the row calculation function calc_row_of_product_matrix out of the class. Furthermore, to be able to pass two arguments to calc_row_of_product_matrix we also needed a helper function, which takes a tuple of a function and args, evaluates and returns the result:

def eval_func_tuple(f_args):
    return f_args[0](*f_args[1:])


Note that passing around the whole matrix B to all processes calculating result rows, or more precisely passing itertools.repeat(b) to  pool.map() visibly increases the memory consumption of the multiprocess version. This was not a real issue for me, as the bottleneck was CPU; anyway, this issue could be addressed by using the shared memory module of multiprocessing. For now we'll leave that as an exercise to the reader.

Here are the running times on my Intel Core2 Duo 3.16GHz box. For sufficiently large matrix (above 500*500) the running time of the multiprocess version is nearly half of the single process version.


100*100 matrix, single process: 0.0835670982343 
100*100 matrix, multiprocess: 0.351096555199 
200*200 matrix, single process: 0.79961114284 
200*200 matrix, multiprocess: 0.700980680681 
500*500 matrix, single process: 14.4003259234 
500*500 matrix, multiprocess: 7.99582457187 
1000*1000 matrix, single process: 118.078526896 
1000*1000 matrix, multiprocess: 66.8809939919 

Here you can see the single process version running with CPU usage of 50%:

Here you can see  the multiprocessing version running with CPU usage of 100%:



Here's the full code:

Wednesday, June 22, 2011

Ngrams with coroutines in Python

This is how I define ngrams with coroutines

I need to filter text before generating ngrams and also, I want to process ngrams (in this case count bigrams)

I combine my coroutines together

Full source can be found in my fork of rrenaud's gibberish detector.

Thursday, June 9, 2011

stackverflow: the most upvoted Q/A tagged as Perl vs as Python

The most upvoted question tagged as Perl
The most upvoted question tagged as Python

Currently, for Python, it is the hidden gems in the language.
You just cannot upvote some of them enough.

On the other side, the most upvoted Perl question at this moment is about Unicode oddities. And the most upvoted answer will scare the shit out of you.

Wednesday, June 8, 2011

Start programming, learn Python


Recently I've been asked to send a few links to absolute beginners who want to start programming. As a language choice, I always recommend to start with Python. So I ended up to collect a list useful links for Python beginners:


Tutorials


Ask for help


Official tutorials 


Read some code
Students need to read real code but it is hard to find production code that is readable for novices.

Practice, write code