Blog Post

AI - Azure AI services Blog
3 MIN READ

Document Translation is generally available now

Krishna_Doss's avatar
Krishna_Doss
Icon for Microsoft rankMicrosoft
May 25, 2021

Today May 25, 2021 at the //build 2021 conference, we are announcing general availability of the Document Translation feature in the Microsoft Translator service. Document translation enables user to translate volumes of large documents, in a variety of file formats including Text, HTML, Word, PowerPoint, Excel, Outlook message, Adobe PDF, and legacy file formats easily into a single or multiple target languages preserving the layout and structure of the original file.

 

Text translation offerings in the market accept only plain text or HTML, and limit the count of characters in a request. Users translating large rich documents must parse the documents to extract text, split them into smaller sections and translate them separately. Splitting sentences at unnatural breakpoints can remove context and result in suboptimal translations. Upon receipt of the translation results, the user must reassemble the translated pieces into a translated document. This complex task involves keeping track of which translated piece corresponds to which section in the original document and reconstructing the layout and format of the original document. The problem is compounded when the customer needs to translate a large volume of documents in a variety of file formats into multiple target languages.

 

Document Translation improves user productivity by handling all this complexity, making it simple to translate a single document or multiple documents, in a variety of formats, into one or more languages.

Document Translation - Work Flow

 

Data Security:

User provides secure access to the documents for the service to translate by either:

  • enabling Managed Identity in the Translator resource and assigning ‘Storage Blob Data Contributor’ role to the Azure storage, or
  • generating a Shared Access Signature (SAS) token with restricted rights for a limited period and pass it in the request.

The code samples in this blog assumes Managed Identity is enabled and ‘Storage Blob Data Contributor’ role is assigned to the Azure storage.

Document translation doesn’t persist customer data submitted for translation. Learn more about Translator confidentiality.

 

Language detection:

Document translation autodetects language of the document content which enables user to translate multi-lingual document to a target language. 

 

 

#Example: Translate single document
{
    "inputs": [
        {
            "storageType": "File",
            "source": {
                "sourceUrl": "https://myblob.blob.core.windows.net/source/multi-lingual-doc.docx"
            },
            "targets": [
                {
                    "targetUrl": "https://myblob.blob.core.windows.net/target/translated-doc.es.docx",
                    "language": "es"
                }
            ]
        }
    ]
}

 

 

 

Customization:

User could use custom models built by them using customer translator to translate documents.

 

 

#Example: Translate all documents in a container using custom model
{
    "inputs": [
        {
            "source": {
                "sourceUrl": "https://myblob.blob.core.windows.net/source"
            },
            "targets": [
                {
                    "targetUrl": "https://myblob.blob.core.windows.net/target",
                    "language": "es",
                    "category": "a2eb72f9-43a8-46bd-82fa-4693c8b64c3c-GENERAL"
                }
            ]
        }
    ]
}

 

 

 

User could apply custom glossaries during translation of documents.

 

 

#Example: Translate all documents in a folder within a container using custom glossary
{
    "inputs": [
        {
            "source": {
                "sourceUrl": "https://myblob.blob.core.windows.net/source",
                "filter": {
                    "prefix": "myfolder/"
                }
            },
            "targets": [
                {
                    "targetUrl": "https://myblob.blob.core.windows.net/target",
                    "language": "es",
                    "glossaries": [
                        {
                            "glossaryUrl": "https:// myblob.blob.core.windows.net/glossary/en-es.xlf",
                            "format": "xliff"
                        }
                    ]
                }	
            ]
        }
    ]
}

 

 

 

Document translation is the outcome of cocreation by customers actively participating in private and public preview programs and providing us insights on their use case scenarios. Our partners showed tremendous confidence on us as the product evolved addressing their needs. Pleased to share few quotes from customers on adopting Document Translation in their workflow.

 

By adding document translation to RelativityOne, we remove the obstacle presented by different languages, enabling our customers to accelerate their investigations and reviews based on a proper understanding of their data.

- Andrea Beckman, Director of Product Management, Relativity a leading e-discovery solution provider.

 

"Utilizing Document Translation, Denso aims to reduce the time spent creating documents for communications between the global peers.” 
- Tetsuhiro Nakane, IT Digital Division IT Architect, Denso, a global automotive components manufacturer.

 

The Document Translation API is accompanied by an SDK and code samples which helps users to build document translation solutions quickly and easily.

References

Updated May 27, 2021
Version 3.0
No CommentsBe the first to comment