“From sheep to Doggy Style traceability of milk chain in Tuscany.”
Confused? Well, you are not the only one.
This phrase appeared on the website of the Italian Ministry of Education and Research some years ago when online translation tools, forgive my rudeness, literally sucked.
How did we improve from that kind of abomination to Google Translate’s most recent (and pretty impressive) performances?
Short answer? AI and neural networks applied to Linguistics.
Long answer? Read the article!
Google Translate’s Neural network revolution
In September 2016, researchers announced the greatest leap in the history of Google Translate: the development of the Google Neural Machine Translation system (GNMT), based on a powerful artificial neural network with deep learning capabilities.
Its architecture was first tested on more than a hundred languages supported by Google Translate.
The system went live two months later, replacing the old statistical approach used since 2007.
The GNMT represented a massive improvement thanks to the possibility to manage zero-shot translations, in which a language is directly translated into another.
In fact, the former version translated the source language into English and then translated the English text into the target language.
But English, as much as all human languages, is ambiguous and context-related. This could cause funny (or embarrassing) translation errors.
Why is GNMT so damn efficient?
GNMT’s brilliant performances could be achieved by considering the broader context of a text to deduce the most fitting translation.
How? GNMT applies an example-based machine translation method (EBMT), in which the system learns over time from millions of examples to create more natural translations.
That’s the essence of deep learning!
The result is then rearranged, considering the grammatical rules of the human language.
This involves the GNMT’s capability of translating whole sentences at a time, rather than just piece by piece. In addition, it can manage interlingual translations by recognizing the semantics of sentences, instead of memorizing phrase-to-phrase translations.
After this brief overview, let me show you in detail how the first statistical approach worked and how neural networks made it obsolete.
How did Statistical Machine Translation work?
Before the implementation of neural networks, statistical machine translation (SMT) was the most successful machine translation method.
It is an approach in which translations are processed considering statistical models whose parameters are derived from the analysis of bilingual text corpora (large and structured sets of texts).
A document is translated according to the probability that a string in the target language is the translation of a string in the source language.
To do so, the system searches for patterns in millions of documents and decide which words to choose and how to arrange them in the target language
The first statistical translation models were word-based but the introduction of phrase-based models represented a big step forward, followed more recently by the incorporation of syntax or quasi-syntactic structures.
How to find the best corpora for Google Translate?
Google got the necessary amount of linguistic data from the United Nations and European Parliament documents, which are usually published in all six official UN languages and so represent a great set of 6-language corpora.
Bravo Google, getting something useful from the United Nations seemed impossible but you did it!
Statistical translation pros
“Old chicken makes good soup”. Is that actually true?
Well, partially yes. The old statistical approach had some tricks up its sleeve.
First, it represented a significant improvement over the even older rule-based approach, which required an expensive manual development of linguistic rules and didn’t generalize to other languages.
Even Franz Josef Och, Google Translate’s original creator, questioned the effectiveness of rule-based algorithms in favor of statistical approaches.
In fact, SMT systems aren’t built around specific language pairings and they are definitely more efficient in using data and human resources
Secondly, there is a wide availability of parallel corpora in machine-readable formats to analyze. Practically a giant menu to choose from.
The SMT cons
On the other hand, statistical translation can have a hard time managing pairs of languages with significant differences, especially in word order.
That’s because of the limited availability of training corpora among pairings of extra-European languages.
Other issues are the cost of creating text corpora and the general difficulty of this approach in predicting and fixing specific mistakes.
Fun Fact of the day: even with all these flaws, Google still uses statistical translation for some languages in which the neural network system has not yet been implemented.
A touch of vintage, I guess.