How Does Machine Translation Work

Machine translation systems are applications or online services that use machine learning technologies to translate large volumes of text from  into one of their supported languages.

The service translates a “source” text from one language to another “target language”.

Although the concepts behind machine translation technology and the interfaces to use them are relatively simple, the science and technologies behind them are extremely complex, bringing together a number of cutting-edge technologies, notably deep learning (artificial intelligence), big data, linguistics, cloud Computing and Web APIs.

Since the beginning of the 2010s, a new artificial intelligence technology called deep neural networks (Alias ​​Deep Learning) has enabled speech recognition technology to reach a level of quality that enabled the Microsoft Translator team to introduce speech recognition with its core text translation technology to introduce a new language translation technology.

Historically, the primary machine learning technique used in the industry has been statistical machine translation (SMT). SMT uses advanced statistical analysis to estimate the best possible translations for a word, in the context of a few words. SMT has been used by all major translation service providers, including Microsoft, since the mid-2000s.

The advent of neural machine translation (NMT) has led to a radical shift in translation technology, resulting in a much higher quality of translations. This translation technology started for users and developers in the last part of 2016.

Both SMT and NMT translation technologies have two elements in common:

Both require large amounts of prehuman over-content (up to millions of translated sentences) to train the systems.
They do not function as bilingual dictionaries, but translate words based on a list of potential translations, but translate based on the context of the word used in a sentence.

The Microsoft Translator Text API and Microsoft Speech Services, part of the Cognitive Services Collection of APIs, are machine translation services provided by Microsoft.

Microsoft Translator Text API has been used by Microsoft Groups since 2007 and has been available as an API to customers since 2011. The Microsoft Translator Text API is extensively used in Microsoft. It is integrated with product localization, support and online communication teams (eg Windows Blog). This service is also available at no additional cost from well-known Microsoft products like Bing, Cortana, Microsoft Edge, Office, Share, Skype, and Whining.

Microsoft Translator can be used in Web or client applications on any hardware platform and with any operating system to perform language translation and other language-related operations such as speech recognition, text to speech, or dictionary.

The developer uses the industry-standard rest technology, sends source code (or audio for language translation) to the service with a parameter that specifies the target language, and the service returns the translated text for the client or web app.

The Microsoft Translator Service is an Azure service that is hosted in Microsoft data centers and benefits from the security, scalability, reliability, and non-stop availability that other Microsoft Cloud services can provide.

Microsoft Translator Speaks translation technology was launched in late 2014, starting with Skype Translator, and has been available as an open API for customers since the beginning of 2016. It integrates with Microsoft Translator Live, Skype, Skype Meeting Broadcast and the Microsoft Translator apps for Android, IOS and Windows.

Language translation is now available through Microsoft Speech, an end to set of fully customizable speech recognition, translation and speech-to-speech (speech synthesis) services.

How does text translation work?

There are two major technologies used for text translation: Legacy One, Statistical Machine Translation (SMT) and the newer Generation One, Neuronal Machine Translation (NMT).

Statistical machine translation

Microsoft Translator Implementation of Statistical Machine Translation (SMT) is built on more than a decade of natural language research at Microsoft. Rather than writing handcrafted rules to translate between languages, modern translation translation systems are approaching the problem of learning to transform text between languages ​​from existing human translations and taking advantage of recent advances in applied statistics and machine learning.

The so-called “parallel Kora” acts as a modern Rosetta stone on a massive scale and provides word, phrase and idiomatic translations in context for many language pairs and domains. Statistical modeling techniques and efficient algorithms help the computer to tackle the problem of deciphering (recognition of correspondences between source and target language in the training data) and decoding (searching for the best translation of a new input). Sentence). Microsoft Translator combines the power of statistical methods with verbal information to produce models that better generalize and translate into more understandable translations.

With this approach, which does not rely on dictionaries or grammatical rules, it offers the best translations of phrases in which it can use the context around a particular word versus trying to do individual word translations. For single-word translations, the bilingual dictionary has been developed and is available at

Neural machine translation

Continuous improvements in translation are important. However, since the mid-2010s, performance improvements have been at the level of SMT technology. By leveraging the scale and power of Microsoft’s AI Supercomputer, particularly the Microsoft Cognitive Toolkit, Microsoft Translator now offers Neuronal Networks (LSTM) based translation, which enables a new decade of translation quality improvement.

These neural network models are available for all languages ​​spoken through Microsoft Speech and through the text API with the “generalnn” category ID.

Neural network translations differ fundamentally in their enforcement compared to the traditional SMT ones.

Based on neural network training, each word is encoded along a 500-dimension vector (a), which represents its unique properties within a particular language pair (e.g., English and Chinese). Based on the language pairs used for training, the neural network will itself define what these dimensions should be. You could use simple concepts like gender (female, male, neutral), courtesy level (slang, casual, writing, formal, etc.), type of word (verb, noun, etc.), but also all other non-obvious characteristics how they are derived from the training data code.

The steps that undergo neural network translations are as follows:

Each word, or more specifically the 500-dimension vector representing it, passes through a first layer of “neurons” that it will encode in a 1000-dimension vector (b) that matches the word in the context of the other words in the Represents sentence.
Once all the words have been coded once into these 1000-dimension vectors, the process is repeated several times, each layer allowing for better fine-tuning of that 1000-dimension representation of the word in the context of the full set (as opposed to SMT technology, the can only consider a window of 3 to 5 words)

The final output matrix is ​​then used by the attention layer (ie, a software algorithm) that uses both this final output matrix and the output of previously translated words to define which word from the source sentence should be translated next , It will also use these calculations to drop potentially unnecessary words in the target language.

The decoder layer translates the selected word (or more specifically, the 1000 dimension vector representing that word in the context of the full sentence) in its most appropriate target language equivalent. The output of this last layer (c) is then fed back into the attention layer to compute which next word from the source sentence should be translated.

In the example depicted in the animation, the context-aware 1000-dimension model of “The” will code that the noun (house) is a feminine word in French (La Maison). This allows the appropriate translation for “The” to be “La” and not “Le” (singular, male) or “Les” (plural) as soon as it reaches the decoder (translation) level.

The attention algorithm is also calculated based on the word (s) previously translated (in this case “the”), that the next word to be translated should be the topic (“house”) and not an adjective (” Blue”). In can accomplish this because the system learned that English and French reverse the order of these words in sentences. It would also have calculated if the adjective “Great” instead of a color that you should not reverse (“The Big House” => “La Grande Maison”).


How does the language translation work?

Microsoft Translator is also capable of translating language. This technology will translate into Live Translator (, the Translator apps, and Skype translators, and will initially only be available through the Skype Translator feature and in the Microsoft Translator apps on IOS and Android, this functionality is now available to developers with the latest version of the Open API available on the Azure portal.

Although at first glance it may seem like a straightforward process to build a spelling translation technology from the existing technology stones, it required much more work than simply stuffing existing “traditional” human-to-machine speech recognition. Engine for existing text translation.

To translate the “source” speech from one language to another “target language”, the system goes through a four-step process.

Speech recognition to convert audio to text
TrueText: a Microsoft technology that normalizes the text to make it more suitable for translation
Translated by the text translation engine described above, but on translation models developed specifically for real life conversations
Text-to-speech, if necessary, to produce the translated sound.


How does the language translation work?
Microsoft Translator is also capable of translating language. This technology will translate into Live Translator (, the Translator apps, and Skype translators, and will initially only be available through the Skype Translator feature and in the Microsoft Translator apps on IOS and Android, this functionality is now available to developers with the latest version of the Open API available on the Azure portal.

The text will then be in one of the 60+ languages ​​supported by Microsoft Translator.

Translations with the Language Translation API (as a developer) or in a language translation app or service are powered by the latest Neural Network-based translations for all language-based languages ​​(see here for the full list). These models were also built by extending the current, mostly-written text-based translation models, with more spoken text-Kora, to create a better model for spoken conversational types of translations. These models are also available through the “Speech” standard category of the traditional text translation API.

For all languages ​​that are not supported by neural translation, traditional SMT translation is used

Text to language
If the target language is one of the 18 supported text-to-speech languages, and the use case requires an audio output, the text is then converted to speech output with speech synthesis. This phase is omitted in speech-to-text translation scenarios.


1 Comment