Blog Post

Healthcare and Life Sciences Blog
3 MIN READ

Protect patient privacy across languages with the de-identification service's preview expansion

leakassab's avatar
leakassab
Icon for Microsoft rankMicrosoft
Nov 18, 2025

Machine learning and analytics are transforming healthcare by streamlining clinical workflows, powering AI models and unlocking new insights from patient data. These innovations are fueled by textual data rich in Protected Health Information (PHI). To be used for research, innovation and operational improvements, this data must be responsibly de-identified to protect patient privacy. Manual de-identification can be slow, expensive, and error-prone, creating bottlenecks that delay progress and limit collaboration. De-identification is more than a compliance standard; it is the key to unlocking healthcare data’s full potential while maintaining patient privacy and trust. 

Today, we are excited to announce the expansion of the Azure Health Data Services de-identification service to support five new preview language-locale combinations: 

  • Spanish (United States)
  • German (Germany) 
  • French (France)
  • French (Canada) 
  • English (United Kingdom)  

This language expansion enables global healthcare organizations to unlock insights from data beyond English while continuing to adhere to regulatory standards.   
 
Why Language Support Matters 

Healthcare data is generated in many languages around the world, and each one comes with its own linguistic structure, formatting, and privacy considerations. By expanding support to multiple preview languages such as Spanish, French, German, and English, our de-identification service allows organizations to unlock data from a broader range of countries and regions. 

But language alone isn’t the whole story. Different locales within the same language (French in France vs. Canada, or English in the UK vs. the US) often format PHI in unique ways. Addresses, medical institutions, and identifiers can all look different depending on the region. Our service is designed to recognize and accurately de-identify these locale-specific patterns, supporting privacy and compliance wherever the data originates. 

 

How It Works  
The Azure Health Data Service de-identification service empowers healthcare organizations to protect patient data through three key operations: 

  • TAG detects and annotates PHI from unstructured text. 
  • REDACT obfuscates PHI to prevent exposure. 
  • SURROGATE replaces PHI with realistic, synthetic surrogates, preserving data utility while ensuring privacy. 
     

Our service leverages state-of-the-art machine learning models to identify and handle sensitive information, supporting compliance with HIPAA's Safe Harbor standards and unlinked pseudonymization aligned with GDPR principle. By maintaining entity consistency and temporal relationships, organizations can use de-identified data for research, analytics, and machine learning without compromising patient privacy. 
 
Unlocking New Use Cases 

By expanding the service's language support, organizations can now address some of the most pressing data challenges in healthcare: 

  • Reduce organizational liability by meeting evolving privacy standards. 
  • Enable secure data sharing across institutions and regions. 
  • Unlock AI opportunities by training models on multilingual, de-identified data.
  • Share de-identified data across institutions to create larger, more diverse datasets. 
  • Conduct longitudinal research while preserving patient privacy. 
     

     

Proven Accuracy 
Researchers at the University of Oxford recently conducted a comprehensive comparative study evaluating multiple automated de-identification systems across 3,650 UK hospital records. Their analysis compared both task-specific transformer models and general-purpose large language models. The Azure Health Data Services de-identification service achieved the highest overall performance among the 9 evaluated tools, demonstrating a recall score of 0.95. The study highlights how robust de-identification enables large-scale, privacy-preserving EHR research and supports the responsible use of AI in healthcare. Read the full study here: Benchmarking transformer-based models for medical record deidentification 
 
Preview: Your Feedback Matters 

This multilingual feature is now available in preview. We invite healthcare organizations, research institutions, and clinicians to: 

At Microsoft, we are committed to helping healthcare providers, payors, researchers, and life sciences companies unlock the value of data while maintaining the highest standards of patient privacy. Azure Health Data Services de-identification service empowers organizations to accelerate AI and analytics initiatives safely, supporting innovation and improving patient outcomes across the healthcare ecosystem. 

Explore Azure Health Data Services to see how our solutions help organizations transform care, research, and operational efficiency. 

Updated Nov 18, 2025
Version 2.0
No CommentsBe the first to comment