Thursday, June 9, 2011

stackverflow: the most upvoted Q/A tagged as Perl vs as Python

The most upvoted question tagged as Perl
The most upvoted question tagged as Python

Currently, for Python, it is the hidden gems in the language.
You just cannot upvote some of them enough.

On the other side, the most upvoted Perl question at this moment is about Unicode oddities. And the most upvoted answer will scare the shit out of you.

Wednesday, June 8, 2011

Start programming, learn Python


Recently I've been asked to send a few links to absolute beginners who want to start programming. As a language choice, I always recommend to start with Python. So I ended up to collect a list useful links for Python beginners:


Tutorials


Ask for help


Official tutorials 


Read some code
Students need to read real code but it is hard to find production code that is readable for novices.

Practice, write code

Tuesday, March 1, 2011

Hunglish 2.0

Hunglish 2.0



Recently Daniel Varga and me have been working hard in our free-time on the Hunglish project and reached a milestone: the whole system around the sentence aligner have been streamlined and/or rewritten. Also it has been deployed onto a test machine and now can be tried by anyone. This new system is going to replace the old one soon, hence any feedback are highly appreciated.


New features

  • Help us build a bigger corpus
You can upload a pair documents (one in English, the other one in Hungarian). The uploaded documents will appear in the search results in a few minutes.
  • Duplicate filtering
Duplicate results were a problem in the previous version. We have tried to filter them out in the new Hunglish.
  • Extended corpus
The original sentence database contained 2 million sentence pairs, which has been nearly doubled.

Planned features 
  • Upvote / downvote
This is only implemented in the back-end, but we plan to add it to the user-interface.
  • Other language pairs
We also plan to add other language pairs. You can make a suggestion for a new language pair, bonus points if you recommend a decent open source stemmer (preferably implemented in Java) for new languages.
  • Your suggestion
Feel free to suggest a new feature!

So we're encouraging you guys to give it a try! Upload documents and search the new Hunglish corpus.
You will be considered a hero if you find and report a bug on the project page.

Wednesday, February 2, 2011

Politicans, please thermalize!

Inspired by an UP rant and this photo.

Thursday, December 30, 2010

Nyelvi igénytelenség

Az NMHH nekiment a Tilosnak. Tele van vele az Internetz, a sajtó, nem is fűzök véleményt az ügyhöz.

Itt csak a Nemzeti Média- és Hírközlési Hatóság leveléből (mirror) idézek két résztletet. Az első a vitatott - stílszerűen "It's on" című - szám kezdete anglolul, a második pedig magyarul. Aki az NMHH-ban ezt a fordítást készítette, annak úgy tűnik, fogalma sem volt arról, hogy itt a kábítószer-kereskedelemről van nagyban szó. Bizonyára azt hitte, hogy minden a dalokban elhangzó trágár kifejezések körül forog.
Yo, Ice, the organization say they can't stay in business with us any longer. What you gonna do?
We always knew we were gonna come to this point sooner or later ... we have absolutely no option but to move forward. We'll have to set up our own distribution, manufacturing, run totally independent organization and operation. We still got our connections in Texas, Miami, New York, Chicago, Detroit and soldiers on the street wiling to die. I can't put any cut on the product.
NMHH magyar fordítása:
Yo, Ice, a szervezet azt mondja, hogy nem üzletelnek velünk többet. Mit tehetünk?
Mindig is tudtuk, hogy el fog jönni előbb vagy utább ez a pillanat, nincs más választásunk, megyünk előre. Létre fogjuk hozni a saját forgalmazó hálózatunkat, gyártásunkat, egy független céget. Még mindig megvannak a kapcsolataink Texasban, Miamiban, New Yorkban, Chicagoban, Detroitban. A katonáink hajlandóak meghalni értünk az utcán. Nem vághatok ki semmit a dalokból.

Monday, August 23, 2010

Convert CSV or TAB delimited text to SQL Insert with Python

Basic task: convert your coma separated or tab delimited txt file into an SQL insert script.
Done in a functional manner. Define your fields in the input file and in the destination table:  excel_fieldssql_fields 
Define your mapping: define a mapping between the two field list: mapping
Also you can assign a function to each element of this mapping. This function will be applied on the value of the input field to get the sanitized or derived db filed values in your insert SQL: map_func
The code that does the actual work is quite simple:
def print_insert(splittedline):
    print ''.join([insert_start, ','.join(map(lambda x : map_func[x](get(mapping[x],splittedline)), sql_fields)), insert_end])
f = open(sys.argv[1]'r')
map(print_insert, filter(filter_lines, map(lambda x : x.split(field_delimiter), f.readlines())))