Azure AI Translator announces Synchronous Document Translation
Published Feb 21 2024 09:10 AM 2,568 Views
Microsoft

Seattle—Feb 21, 2024—Today, we are pleased to announce the public preview of the synchronous operation of document translation feature in Azure AI Translator service. This new synchronous operation allows users to translate a document in real time into a target language.


Document translation enables users to translate complex documents in a variety of file formats including Text, HTML, Word, Excel, PowerPoint, and Outlook messages whilst preserving the source document’s format and layout. The service autodetects the language of the text in the source document if it is unknown to the user. In addition, the user in the request can optionally send a glossary of terms to apply when translating the document.

Updated GIF.gif

Enterprise customers, using document translation asynchronous batch operation, have provided feedback that their employees managing highly confidential documents are hesitant to upload them to a shared cloud storage of their organization for translation. New synchronous operation addresses such need by processing and translating the entire document in memory, avoiding a need to store documents in any storage, even temporarily. The synchronous operation takes a document as part of the request, translates the textual content in the document into a specified target language, and returns the translated document as part of the response. It supplements asynchronous batch operation of document translation which has been generally available since May 2021.

 

 

Document Translation

Asynchronous batch operation

Synchronous operation (preview)

Asynchronously translates batches of up to 1000 documents, into up to 10 target languages in a single request.

Synchronously translates a single document into single target language.

Upload the document to translate into Azure blob storage. In the request, send the Azure blob storage location URLs of source and target documents.

In the request send the source document and get the translated document in the response.

Translate large documents of size up to 40MB

Translate a document of size up to 10MB

Supports translation of document formats including Text, HTML, Markdown, Office, Outlook message, PDF, and legacy Office and Open document formats.

Supports translation of document formats including Text, HTML, Markdown, Office, and Outlook message.

 

Document translation synchronous operation is priced at the same rate as asynchronous batch operation.

 

To try and adopt document translation synchronous operation, as a prerequisite you need an active Azure subscription and an Azure AI Translator resource. Please use the following code samples to try it out.

 

Sample curl command to translate a document:

 

curl -i -X POST "{document-translation-endpoint}/translator/document:translate?sourceLanguage={language_code}&targetLanguage={language_code}&api-version=2023-11-01-preview" \
-H "Ocp-Apim-Subscription-Key:{Your resource key}"  \
--form "document={full-path-to-source-file};type={content-type}/{file-extension}" \
--output "{full-path-to-translated-file}"

 

Python code sample to translate a document:

 

import requests
import os

#Construct URL
endpoint = "<Your document translation endpoint>"
path = "/translator/document:translate"
url = endpoint + path

headers = {
    "Ocp-Apim-Subscription-Key": "<Your resource key>"
}

# Define the parameters 
# Get list of supported languages and code here: https://aka.ms/TranslatorLanguageCodes 
params = {
    "sourceLanguage": "<source language code>",
    "targetLanguage": "<target language code>",
    "api-version": "2023-11-01-preview"
}

# Include full path, file name and extension
input_file = "<full path to source file>"
output_file = "<full path to translated file>"

# Open the input file in binary mode
with open(input_file, "rb") as document:
    # Define the data to be sent
    # Find list of supported content types here: https://aka.ms/dtsync-content-type
    data = {
        "document": (os.path.basename(input_file), document, "<Your file content type>")
    }

    # Send the POST request
    response = requests.post(url, headers=headers, files=data, params=params)

# Write the response content to a file
with open(output_file, "wb") as output_document:
    output_document.write(response.content)

 

References:

Version history
Last update:
‎Feb 20 2024 03:50 PM
Updated by: