Mithun Prasad, PhD and Jaya Mathew, Data Scientists at Microsoft
There is a growing demand for applications which support speech, language identification, translation or transliteration from one language to another. Common questions that customers encounter as they consider possible solutions are:
- How do we identify which language is being spoken/typed by the user?
- How do we translate one language to another for the user?
- Is there a way to transliterate text from one language to another?
- Can we determine the intent of a user utterance?
- How do we incorporate natural language understanding into our chatbot application for multiple languages?
Complex problems such as these can now be solved using advanced APIs that are readily available without having to reinvent the wheel – no machine learning expertise required!
This blog starts off with a brief introduction to machine translation and then explores various topics like identifying the language and how to perform translation/transliteration of spoken or typed text using Microsoft’s Translator Text API. In addition, we also discuss how translated or transliterated text can be integrated with LIUS.
Machine Translation
Machine Translation (MT) encompasses the various tasks involved in converting source text from one language to another. Over the years, the domain of MT has evolved from Rule-based-MT (RMT) to Statistical-MT (SMT) to Neural-MT (NMT). Briefly, using a large artificial neural network, we predict the likelihood of a sequence of words, typically modeling entire sentences in an integrated model. A basic NMT model consists of two stages where the first step is encoding of the words in the source text based on the context and then the second stage is the decoding of the encoded text from the first step into the output target text.
Microsoft Translator text API
Microsoft’s Translator Text API can be used in any web or client applications on any hardware platform and with any operating system to perform language translation and other language-related operations such as language detection, text to speech conversion, or dictionary functionality(?). Like other Microsoft cloud services, this API is an Azure service hosted in Microsoft data centers and benefits from the security, scalability, reliability, and nonstop availability.
Figure-1: Execution steps of a neural netwok for sentence translation
Figure-1 depicts the execution steps of a neural network for sentence translation. Based on the neural network training, each word is coded along a 500-dimensionsal vector (a) representing unique sentence characteristics within a language pair (e.g. English and French). Based on the language pairs used for training, the neural network will self-define what these dimensions should be. They could encode simple concepts like gender (feminine, masculine, neutral), politeness level (slang, casual, written, formal, etc.), type of word (verb, noun, etc.), but also any other implicit characteristics, manifested as statistical regularities in the training data.
The steps that occur when using a trained model to translate new sentences are the following:
- Each token, i.e. its 500-dimensional vector representation, goes through a first layer of "neurons" that will encode it in a 1000-dimensional vector (b) representing the word within the context of the other words in the sentence.
- Once all words have been encoded one time into these 1000-dimension vectors, the process is repeated several times, each layer allowing better fine-tuning of this 1000-dimension representation of the word within the context of the full sentence.
- The final output matrix is then used by the attention algorithm that will use both this final output matrix.
- The decoder (translation) layer, translates the selected word (or more specifically the 1000-dimension vector representing this word within the context of the full sentence) in its most appropriate target language equivalent. The output of this last layer (c) is then fed back into the attention layer to calculate which next word from the source sentence should be translated.
With the NMT approach, the final output is, in most cases, more fluent and closer to a human translation than an SMT-based translation could have ever been.
Using Microsoft’s Translator text API
This versatile API from Microsoft can be used for the following:
- Translate text from one language to another.
- Transliterate text from one script to another.
- Detecting language of the input text.
- Find alternate translations to specific text.
- Determine the sentence length.
In addition, a custom translator model can be built to leverage the models built by Microsoft and further enhance it with the customer data.
LUIS Integration
Natural-language understanding (NLU) is a subtopic of Natural Language Processing that is related to machine reading comprehension and is vital to the field of machine translation. NLU interprets the meaning that the user communicates and classifies it into proper intents.
For example, it is relatively easy for humans who speak the same language to understand each other, although mispronunciations, choice of vocabulary or phrasings may complicate this. NLU is responsible for this task of distinguishing what is meant by applying a range of processes such as text categorization, content analysis and sentiment analysis.
Language Understanding (LUIS) offers a fast and effective way of adding language understanding to applications. Designed to identify valuable information in conversations, LUIS interprets user goals (intents) and distills valuable information from sentences (entities), for a high quality, nuanced language model. If you need a multi-language LUIS client application such as a chatbot, you have a few options. If LUIS supports all the languages, you develop a LUIS app for each language. Each LUIS app has a unique app ID, and query endpoint.
After translating or transliterating, you can submit the utterance to the LUIS endpoint, and receive the resulting scores. To provide language understanding for a language LUIS does not currently support, you can either translate or transliterate the utterance into a supported language. If we were trying to build a restaurant ordering chatbot, an example of a transliterated/translated text in Hindi (containing intents and entities) can get integrated in LUIS is as follows:
Original snippet in English:
{ |
Transliterated option:
{ "text": "vivek ko khanna deliver karo", "intent": "Food.Delivery", "entities": [ { "entity": "People.Name", "startPos": 0, "endPos": 4 } ] } |
The option of transliteration works since all the utterances are tagged with intents/entities and are used to build the model.
Translation option:
Alternatively, the native text in Hindi ("विवेक को खन्ना डिलीवर कारो") can be translated into English ‘deliver food to Vivek’ and used for intent/entity extraction via luis endpoint.
The examples like the ones above can be expanded if you can perform a lookup of a list of common names and perform replacement in a range of utterances. For example, the training data can be augmented using "{name} ko khanna deliver karo" or "Deliver food to {name}" in the above scenario.
In addition, you can also create model_features in transliterated/translated text and mention the words in model_features in the utterances for better NLU. For example, the below model_features contain various menu items that can be referenced in the utterance for a restaurant ordering chatbot.
"model_features": [ "name": "food", |
The advantage of using a phrase list is that you do not need many utterance data points. Mentions of a handful of words in utterances will suffice. Similarly, all the variants of food items can be captured as part of closed lists as shown below. With the below closed list, the variants are all mapped to the name.
"closedLists": [ {
On training and publishing the model, the endpoint can be called to obtain the intents and entities that are transliterated. An example utterance of "mani ko sambar do" ("मणि को सांबर दो") which is "Give Mani Sambar" in the restaurant space can produce:
Transliterated option: { "query": " mani ko sambar do ", }
Alternatively, the native text in Hindi ("मणि को सांबर दो") can be translated into English "Give Mani sambar". One of the things to keep in mind is that often translations occurs for regional person names which might need to be corrected manually before the utterance can be used. Example – Mani could get translated to Gem as shown in this utterance ‘Give The Gem Sambar’, where manually we need to edit the name in the context of the sentence.
Translation option: {
In summary, NLU capabilities can power chatbots or drive intelligence into search queries. To provide language understanding for a language that LUIS does not currently support, you can either translate or transliterate the utterance into a supported language.
References https://www.microsoft.com/en-us/translator/ |