The Apertium project (http://www.apertium.org) is a project which works on open-source machine translation and language technology. We try and focus our efforts on lesser-resourced and marginalised languages, but also work with larger languages.
To date, we have released translators for 17 language pairs, covering languages spoken by 1.1bn people, from English (est. 500m speakers) to Aranese (est. 4,000 speakers). A similar number of other language pairs are in development.
The Apertium software is licensed under the GPL, but in addition (a rarer situation in the machine translation field) so is the DATA for all these language pairs. This means that the data can be re-used by other language projects (e.g. in developing spelling or grammar checkers, thesauri, etc).
A growing team of people is working on this project, so if you are interested in details like sentence alignment or trigram tagging, or just want to use your knowledge of a language to create a useful application for it, come and join us.
The code samples from our completed student projects are available here: http://code.google.com/p/google-summer-of-code-2009-apertium/downloads/list
These projects have been accepted into The Apertium Project. You can learn more about each project by visiting the links below.
| Student | Title | Mentor | Status |
|---|---|---|---|
Conversion of Anubadok: Building an English-Bengali Language Pair For Apertium |
francis tyers |
accepted |
|
Multi-Engine Machine Translation |
francis tyers |
accepted |
|
Apertium nb2nn: machine translation between Norwegian Bokmål and Nynorsk |
trond trosterud |
accepted |
|
Apertium-sv-da: Machine translation between Swedish and Danish |
jacob nordfalk |
accepted |
|
Apertium going SOA |
Jimmy O'Regan |
accepted |
|
Java port of Apertium lttoolbox proposal |
Sergio Ortiz Rojas |
accepted |
|
Highly scalable web service architecture for Apertium |
Juan Antonio Perez-Ortiz |
accepted |
|
Implement a Trigram Tagger for Apertium and support-tools for training it |
Felipe Sanchez-Martinez |
accepted |