Announcing Native Document support for PII Redaction and Summarization
Published Feb 07 2024 09:10 AM 2,637 Views
Microsoft

Customers exploring AI capabilities on documents have to go through pre- and post-processing efforts. For tasks such as Personally Identifiable Information (PII) redaction, summarization and more, they have to provide documents as input data, often have to crack it open, format it and then recreate the document. This is time-consuming, expensive, and inconvenient.

 

To alleviate this challenge, we are delighted to announce the availability of native document support in Azure AI Language. This is available in public preview with limited access (apply for access).

 

This significantly streamlines operations by eliminating the need for pre-processing steps such as extracting pertinent text from native documents and post-processing tasks like reconstructing the original document after undergoing processing. Here’s a simple diagram of how the feature works.

Flow.png

PII Redaction

This enhancement allows for the identification, categorization, and redaction of sensitive information directly from complex documents. It streamlines data privacy and compliance, and reinforces security, offering users tangible benefits.

 

Here’s a sample input and output document:

Screenshot for email.png

 

Some top use cases which can benefit from this feature include:

  • Insurance: Automate PII redaction in policy documents and claims records for compliance and customer privacy. Securely encrypt and store redacted documents. 
  • Healthcare: Scan medical records for sensitive information, enabling practitioners to automatically redact PII before archiving, ensuring compliance and robust patient data security.
  • Finance: Financial institutions leverage native document support for PII detection to automatically identify and redact PII in loan applications and account statements. This ensures secure archival of documents, boosting customer confidence.

Check out this video demo:

 

REST API - PII Redaction

The REST API for PII redaction requires parameters ‘source location’ (URL of the input document’s location), ‘target location’ (URL of the target container’s location) and ‘language’ (the language of content in the source document). Please refer to our documentation for additional information on the API.

 

Additionally, customers can customize the service to fit their scenarios with the following optional parameters in the service:

  • excludePiiCategories: Exclude irrelevant PII categories for your use case.
  • piiCategories: Redact specific PII categories from documents.
  • redactionCharacter: Choose the character for redaction.
  • redactionPolicy: Decide whether PII should be redacted with characters or respective entity names.

 

Summarization

For the same document (that was shown in the demo for PII above) as an input, here’s the sample Extractive Summarization output:

 

{
  . . .
  "sentences": [
   {
    "text": "Mateo Gomez, 28-year-old man, suffered a car accident driving near his home on Hollywood Boulevard on August 17th, 2022, and was admitted to Contoso General Hospital in Los Angeles California at 7:45 PM.",
    "rankScore": 1,
    "offset": 36,
    "length": 203
   },
   {
    . . .
   },
   {
    "text": "Results showed a pseudoaneurysm of the thoracic aorta with minor fractures to the first and third right ribs.",
    "rankScore": 0.71,
    "offset": 432,
    "length": 109
   }
  ]
 }

 

REST API - Summarization

The REST API for summarization requires parameters ‘source location’ (URL of the input document’s location), ‘target location’ (URL of the target container’s location) and ‘kind’ (type of summarization required). Please refer to our documentation for additional information.

Additionally, customers can customize the service to fit their scenarios with the following optional parameters in the service:

  • sentenceCount: Guide how many sentences are returned.
  • sortBy: Specify in what order the extracted sentences are returned.

We are excited about the possibilities that this new capability brings to developers around the globe, empowering our customers to create scenario-based AI solutions that enhance and complement their workflows with Large Language Model (LLM) and Generative AI.

 

Learn more & get started: https://aka.ms/document-language-ai

Sign up form to request access to the feature: https://aka.ms/gating-native-document

 

Have thoughts or questions? We value your feedback! Feel free to share your comments and insights below – we're eager to hear from you.

 

1 Comment
Co-Authors
Version history
Last update:
‎Feb 06 2024 10:02 PM
Updated by: