Translator announces Document Translation (Preview)
Published Feb 17 2021 01:30 PM 11.5K Views
Microsoft

We are announcing Document Translation, a new feature in Azure Translator service which enables enterprises, translation agencies, and consumers who require volumes of complex documents to be translated into one or more languages preserving structure and format in the original document. Document Translation is an asynchronous batch feature offering translation of large documents eliminating limits on input text size. It supports documents with rich content in different file formats including Text, HTML, Word, Excel, PowerPoint, Outlook Message, PDF, etc. It reconstructs translated documents preserving layout and format as present in the source.

DocTransImage (1).png

Standard translation offerings in the market accept only plain text, or HTML, and limits count of characters in a request. Users translating large documents must parse the documents to extract text, split them into smaller sections and translate them separately. If sentences are split in an unnatural breakpoint it can lose the context resulting in suboptimal translations. Upon receipt of the translation results, the customer has to merge the translated pieces into the translated document. This involves keeping track of which translated piece corresponds to the equivalent section in the original document.


The problem gets complicated when customers want to translate complex documents having rich content. They convert the original file in variety of formats to either .html or .txt file format and reconvert translated content from html or txt files into original document file format. The transformation may result in various issues. The problem gets compounded when customer needs to translate a) large quantity of documents, b) documents in variety of file formats, c) documents while preserving the original layout and format, d) documents into multiple target languages.

 

Document Translation is an asynchronous offering to which the user makes a request specifying location of source and target documents, and the list of target output languages. Document Translation returns a job identifier enabling the user to track the status of the translation. Asynchronously, Document Translation pulls each document from the source location, recognizes the document format, applies right parsing technique to extract textual content in the document, translates the textual content into target languages. It then reconstructs the translated document preserving layout and format as present in the source documents, and stores translated document in a specified location. Document Translation updates the status of translation at the document level. Document Translation makes it easy for the customer to translate volumes of large documents in a variety of document formats, into a list of target languages thus eliminating all the challenges customers face today and improving their productivity.

 

Document Translation enables users to customize translation of documents by providing custom glossaries, a custom translation model id built using customer translator, or both as part of the request. Such customization retains specific terminologies and provides domain specific translations in translated documents.


“Translation of documents with rich formatting is a tricky business. We need the translation to be fluent and matching the context, while maintaining high fidelity in the visual appearance of complex documents. Document Translation is designed to address those goals, relieving client applications from having to disassemble and reassemble the documents after translation, making it easy for developers to build workflows that process full documents with a few simple steps.”, said Chris Wendt, Principal Program Manager.

 

To learn more about Translator and the Document Translation feature in the video below

 

References

 

Send your feedback to translator@microsoft.com  

 

 

Co-Authors
Version history
Last update:
‎Feb 17 2021 12:35 PM
Updated by: