Case study: AI translation software training using IdeaText resources from Russian to English

Last year we took part in a groundbreaking AI learning project, where texts translated by IdeaText translators were used by AI software to learn translation of colloquial phrases and conversational language from Russian into English.


IdeaText was tasked with translating tweets on various topics from Russian into English. The content was afterwards used to train an artificial intelligence translation engine to be able to translate in this language combination. Since the content was to be used to train AI, machine translation was not an option and everything had to be translated and reviewed by human translators.


The client requested this to be done in a very short turnaround time. As a result, we needed to perform a translation of 350 000 words in 2 weeks.


Our Senior Project Manager Liva Vucina was tasked with creating a schedule and establishing communication channels between 10 translators and 10 reviewers. Not only was it essential to have certified, high quality translators on the task, communication speed turned out to be a deciding factor when forming this translation team.


The project was established on the translation software platformon Wordbee. It kicked off with ten translators and one inhouse team tackling the translation part. Each of the ten batches had a scheduled four-stage delivery plan that translators had to follow to the hour. Once a batch was finished for translation, the reviewers took over and finalized each batch. Our project manager then run a QA for each batch and communicated all the issues back to reviewers for final updates.


The first issue we encountered was the choice of translation software. Wordbee translation platform, from our experience, slowed down drastically both due to massive file sizes and the number of concurrent users. We offered an alternative choice of software that is more optimized for large batch file processing, however, the client had already invested in Wordbee, therefore our teams just pushed through to the best of their ability. We observed some data loss due to synchronization issues, but it was resolved by retranslating these parts.

Another problem that persisted throughout the project was actually the content itself. The language of the tweets ranged from short opinions to emojis, swear words and sometimes – utter nonsense. Our translation teams agreed to unify the register and style to a certain degree, but the different levels of language register made it especially difficult. This was achieved by a two filter principle, where translators did their work in accordance to language register guidelines. The text was afterwards polished by editors, using the same set of guidelines.

Lastly, a common challenge for large volume, short turnaround tasks is consistency. Our project manager came up with the great idea of creating a separate live channel for glossary tracking and reviewer discussion. Our reviewers used this channel to exchange ideas, discuss terms and agree on the way going forward. This open channel provided a massive improvement in terms of consistency for the whole assignment, especially when more than 20 people are involved on a single project.


Thanks to brilliant planning and scheduling by our project manager and immense research by our vendor manager, we managed to create a team of 10 reliable translators and 10 reviewers that helped us to deliver 350 000 words in two weeks a day early. The client was satisfied and we were happy to undertake such a challenge and leave our mark on the training of an AI translation engine!