Welcome to Qin Gao's software page, hope you can find something useful here
2011/12/29 Uploaded new force alignment scripts that is compatible to Moses. It can be downloaded here.
2010/05/10 Updated instruction for force alignment, thanks to Arek.
2010/03/08 Bug fix for Chaski Download
2010/01/23 Release of Chaski and MGIZA will be on SourceForge
2010/01/11 Maintenance release of Chaski (0.2.3) and MGIZA (0.6.3)
Important bug fix for MGIZA. If you encounter segmental fault during model 3 training, please use the latest version. 0.6.3
2009/12/07 Maintenance release of Chaski (0.2.2) and MGIZA (0.6.2)
2009/11/27 Maintenance release of Chaski (0.2.1)
2009/11/24 Configuration documentation for MGIZA++
A (almost) complete list of MGIZA++ configuration documentation is online now: MGIZA++ Configuration
2009/11/11 New verison of Chaski !
I am glad to release the new version of Chaski, the functionality of Chaski package is greatly extended in the new version. Now PGIZA is integrated into Chaski and a new distributed word clustering tool, which means you can start from raw corpus and build the complete phrase table and lexiconized reordering model compatible with Moses. Instead of waiting weeks on single machine, the full training of a 6 million sentence pairs corpus now takes half a day on Hadoop cluster.
Please see overview for more detail of how to download/install the new version. And any suggestion/bug report is appreciated.
Chaski : A software package for training phrase-based machine translation system on Hadoop clusters, together with MGIZA it can train large scale model in hours.
HadoopDaemon : A simple interface to help you run ANY program using hadoop. I.E. it makes Hadoop more like Condor or Maui/Torque, which may appear to be bad… But sometimes you need it, because going through the MapReduce framework may just screw up your files. (And you don't have other choices since Hadoop is the only way to run you job…)
MGIZA++ : Multi-threaded GIZA++. It is a extended and optimized version of GIZA++, which can run multi-threaded, and provide additional functionalities/optimizations such as: