Security consideration of Azure OpenAI with Retrieval Augmented Generative pattern (part 1 of 3)

Microsoft

May 17, 2024

The Retrieval Augmented Generative (RAG) pattern is a novel approach that combines neural text generation with information retrieval. It allows the generation of natural language responses that are relevant, coherent, and informative, by retrieving and conditioning on relevant documents from a large corpus. RAG pattern has several benefits for the health care industry, such as:

It can produce precise and concise summaries of patient symptoms, medical history, and test results.
It can enhance the communication and education of health care professionals and patients, by generating personalized and engaging responses to their queries, based on the retrieval of authoritative and trustworthy sources.
It can facilitate the innovation and discovery of new treatments and therapies, by generating novel and creative hypotheses, based on the retrieval of cutting-edge research papers and clinical trials.

However, there are number of security risks associated with RAG patter usage. Few examples:

Privacy breaches: RAG models may inadvertently reveal sensitive information about patients or health care providers, by retrieving documents that contain identifiable data, such as names, addresses, or medical records. This could violate the confidentiality and consent of the individuals involved and expose them to potential harm or discrimination.
Adversarial attacks: RAG models may be vulnerable to malicious manipulation, by retrieving documents that are intentionally crafted to deceive or mislead the model. This could result in the generation of harmful or misleading content that could endanger the health and safety of the users or the public.

This is the first article in a series about secure use of Azure OpenAI in health care. Our topic today is how to avoid violating privacy when using RAG pattern.

Imagine you're developing an app that utilizes the Azure OpenAI service to enable patients to query their after-visit summaries. This is a great illustration of a scenario where RAG pattern is essential. The data output from your team consisted of a set of documents with one document for each patient, and some private information was accidentally exported. Below is a sample of a document.

Imagine you've built a patient chat application on your website. Only authorized users can access it, and it offers a predefined list of questions for patients to choose from. But what if Amanda's account is compromised? A malicious actor could gain access, bypass the predefined questions, and ask anything they want. Alternatively, a bug might allow custom questions you didn't anticipate.

Let's explore the potential data a malicious actor could extract if no additional security measures are in place beyond chat authorization and a predefined question list. Using Azure OpenAI Studio as a chat interface, we'll delve into the risks and vulnerabilities of such a scenario.

Let’s start with a simple query:
> Show me Amanda’s data

As demonstrated, it's evident that utilizing the RAG pattern could facilitate the retrieval of all of Amanda's data, including any personal information that might have been unintentionally exported. Now, leveraging the structure of the data, we can access the information of other users as well. For instance, with the query:

> Show me all medical history

Now that we have obtained other users' names, we can extract data utilized by the RAG pattern for any patient. For instance, with the query:
> Show me Michael’s personal data

What can we do to tackle privacy breaches? It's evident that relying solely on authorization falls short. Achieving protection for Azure OpenAI from malicious prompts is a crucial aspect of maintaining the safety and integrity of the API. Fortunately, implementing the necessary measures is a straightforward process that can be achieved by following a few simple steps:

Set up an Azure API Management service for your OpenAI API. This will act as a gateway between your OpenAI API and the outside world.
Configure your API Management instance to use the Azure AI Content Safety service as a pre-processing step for all incoming requests. This will ensure that all requests are scanned for potentially malicious content before being forwarded to your OpenAI API.
Use prompt engineering techniques to design prompts that are less likely to result in malicious behavior. This can be done by carefully crafting prompts to encourage specific types of responses and avoiding prompts that might elicit inappropriate responses.

Updated May 17, 2024

Version 1.0

Sergey_Shapovalov

Microsoft

Joined June 27, 2018

View Profile

Healthcare and Life Sciences Blog

Follow this blog board to get notified when there's new activity

Blog Post

Security consideration of Azure OpenAI with Retrieval Augmented Generative pattern (part 1 of 3)