Introducing Text Analytics for Health
Published Jul 08 2020 08:00 AM 37.6K Views

Hadas Bitran, Group Manager, Microsoft Healthcare


The healthcare industry is overwhelmed with data. Much of this healthcare data is in the form of unstructured text, such as doctor’s notes, medical publications, electronic health records, clinical trials protocols, medical encounter transcripts and more. Healthcare organizations, providers, researchers, pharmaceutical companies, and others face an incredible challenge in trying to identify and draw insights from all that information. Unlocking insights from this data has massive potential for improving healthcare services and patient outcomes.


Today, we are excited to introduce Text Analytics for health, a new preview feature of Text Analytics in Azure Cognitive Services that enables developers to process and extract insights from unstructured medical data. Trained on a diverse range of medical data—covering various formats of clinical notes, clinical trials protocols, and more—the health feature is capable of processing a broad range of data types and tasks, without the need for time-intensive, manual development of custom models to extract insights from the data.

Text Analytics for Health 3D illustration_07062020.jpg

Uncover deep insights and relationships in medical data


With Text Analytics for health, users can detect words and phrases mentioned in unstructured text as entities that can be associated with semantic types in the healthcare and biomedical domain, such as diagnosis, medication name, symptom/sign, examinations, treatments, dosage, and route of administration. (For full list of health entity types and relationships, see the documentation.) In addition, users can extract more than 100 types of personally identifiable information (PII), including protected health information (PHI), in unstructured text.


TA for health image 1.png

Text Analytics also links entities to medical ontologies and domain-specific coding systems (for example, the Unified Medical Language System), and identifies meaningful connections between concepts mentioned in text (for example, finding the relationship between a medication name and the dosage associated with it).


TA for health image 2.png

The meaning of medical content is also highly affected by modifiers, such as negation, which can have a critical implication if misdiagnosed. For example, it is important for healthcare professionals to determine when a patient “has not been diagnosed with something” or “ does not experience a certain symptom.” The health feature supports negation detection for the different entities mentioned in text.


Speed time to healthcare insights


Text Analytics for health enables researchers, data analysts, medical professionals and ISVs in the healthcare and biomedical space to unlock a wide range of scenarios—like producing analytics on historical medical data and creating prediction models, matching patients to clinical trials, or assisting in clinical quality reviews.


In response to the COVID-19 pandemic, Micosoft partnered with the Allen Institute for AI and leading research groups to prepare the COVID-19 Open Research Dataset, a free resource of over 47,000 scholarly articles for use by the global research community. With Cognitive Search and Text Analytics, we developed the COVID-19 search engine, which enables researchers to more quickly evaluate and gain insights from the overwhelming amount of information about COVID-19.


Learn more about using AI to mine unstructured research papers to fight COVID-19.


We are working closely with organizations such as the University College London (UCL), which is conducting reviews of medical research reports.


“One of our focuses as a research group is undertaking systematic reviews across a range of policy areas,” says Professor James Thomas at UCL, and Director of the EPPI-Centre’s Reviews Facility for the Department of Health, England. “We have been partnering with engineers at Microsoft and data scientists to build a ‘living’ reviews system – that automatically identifies relevant research for reviews as they are published. Text Analytics for health provides a powerful tool for extracting insights from clinical literature, with rich support for a wide range of healthcare terminology so that we can more quickly and accurately identify relevant information.”


At Microsoft, our goal within healthcare is to empower people and organizations to address the complex challenges facing the healthcare industry today, working closely with our customers and partners to bring healthcare solutions to life. We’re excited to make Text Analytics for health available in support of this mission.


Get started with Text Analytics for health


Text Analytics for health is currently available in containers. With containers, you can deploy resources in your own development environment that meets your specific security and data governance requirements.


The container provides REST-based query prediction endpoint APIs. Below is an example API request and response body:

TA for health container GIF.gif

For more resources:

Version history
Last update:
‎Jul 08 2020 11:04 AM
Updated by: