Tuesday, March 1, 2011

Hunglish 2.0

Hunglish 2.0

Recently Daniel Varga and me have been working hard in our free-time on the Hunglish project and reached a milestone: the whole system around the sentence aligner have been streamlined and/or rewritten. Also it has been deployed onto a test machine and now can be tried by anyone. This new system is going to replace the old one soon, hence any feedback are highly appreciated.

New features

  • Help us build a bigger corpus
You can upload a pair documents (one in English, the other one in Hungarian). The uploaded documents will appear in the search results in a few minutes.
  • Duplicate filtering
Duplicate results were a problem in the previous version. We have tried to filter them out in the new Hunglish.
  • Extended corpus
The original sentence database contained 2 million sentence pairs, which has been nearly doubled.

Planned features 
  • Upvote / downvote
This is only implemented in the back-end, but we plan to add it to the user-interface.
  • Other language pairs
We also plan to add other language pairs. You can make a suggestion for a new language pair, bonus points if you recommend a decent open source stemmer (preferably implemented in Java) for new languages.
  • Your suggestion
Feel free to suggest a new feature!

So we're encouraging you guys to give it a try! Upload documents and search the new Hunglish corpus.
You will be considered a hero if you find and report a bug on the project page.