Announcing new AI Safety & Responsible AI features in Azure OpenAI Service at Ignite 2023
Published Nov 15 2023 08:00 AM 7,118 Views
Microsoft

Our Azure OpenAI Service team is excited to announce new AI safety and responsible AI features at Ignite 2023. These new features include jailbreak risk detection, protected material detection, configurations for expanded customer control, available in public preview, and an asynchronous modified content filter which is coming soon to public preview.  

These features build on the existing content filters and AI safety mechanisms, and further enhance the security of LLM deployments and protect against outputting known natural language content and code.  

 

Jailbreak risk detection (Preview) 

Jailbreak risk detection is a feature in Azure OpenAI Service that focuses on detecting jailbreak attacks, which pose significant risks to Large Language Model (LLM) deployments. A Jailbreak Attack, also known as a User Prompt Injection Attack (UPIA), is an intentional attempt by a user to exploit the vulnerabilities of an LLM-powered system, bypass its safety mechanisms, and provoke restricted behaviors. These attacks can lead to the LLMs generating inappropriate content or performing actions restricted by System Prompt or Reinforcement Learning with Human Feedback (RLHF).  

Jailbreak risk detection is a state-of-the-art model that identifies anomalies in user prompts as potential Jailbreak Attacks. This powerful tool enhances the security of LLM deployments by detecting jailbreak based on their patterns and intents, rather than the outcomes or the harmful completions that might follow. Attack classes may include user prompts that present requests to the model to change or bypass rules in the System Message, role play, encoding attacks, and more.  

This new filter complements existing content filters and safety mechanisms that prevent AI systems from responding to inappropriate or dangerous requests. In addition to being available in public preview in Azure OpenAI Service, it is now also in public preview in Azure AI Content Safety. 

 

 

Protected material detection (Preview) 

Protected material detection is a feature in Azure OpenAI Service that helps detect and protect against outputting known natural language content and code. It checks for matches with an index of third-party text content and public source code in GitHub repositories. The feature was designed to help flag certain known third-party content for customers when integrating and using generative AI. 

The feature uses low-latency "snippet matching" against the index, meaning the user provides some text, and the model finds sequences of tokens in that text which also appear in the index. There are two independent indexes, one for code built off public GitHub repositories, and the other for natural language textual items. In the UI, these two filters are referred to as protected material detection for text, and protected material detection for code.  

Protected material detection filters are optional filters in Azure OpenAI Service and are off by default. When turned on, the annotations in Azure OpenAI Service  will provide information on matching content. For code, if a match is identified, the annotations will also provide example citations of public GitHub repositories where the code snippet appears and the licenses of those GitHub repositories. 

The protected material detection for text model is now also in public preview in Azure AI Content Safety.  

 

Expanded customer control 

Azure OpenAI Service customers can now configure all content filtering severity levels and create custom policies including less restrictive filters than default based on their use case needs. The content filtering system that covers the categories hate and fairness, violence, sexual and self-harm, is integrated in Azure OpenAI Service and enabled by default. For each category, it detects one of four severity levels (safe, low, medium, high) and takes action on potentially harmful content in both user prompts and model completions. The default filters are triggered when content is flagged at medium severity level. While customers could previously  configure filters to be stricter than default, we have now expanded customer control to all severity levels, including configuring the filters to be less restrictive and only filter high severity content if that is aligned with use case requirements. Disabling content filters requires an application and approval for modified content filters. 

 

FedericoZarfati_0-1700063210303.png

 

Customers can customize content filter behavior for Azure OpenAI Service prompts and completions further by creating a custom blocklist in their filters. The custom blocklist allows the filter to take action on a customized list of patterns, such as specific terms or regex patterns. This feature enables granular control over a customer's Azure OpenAI Service deployment so it is tailored to their specific use case. 

 

Asynchronous Modified Content Filter ( Coming Soon to Preview)  

Asynchronous Modified Content Filter is a new feature in Azure OpenAI Service that allows content filters to run asynchronously with significant latency improvements for streaming scenarios. When content filters are run asynchronously, completion content is returned immediately without being buffered first, resulting in a smooth and fast token-by-token streaming experience.  

This feature is available only to customers who apply and are approved to use  modified content filters. Customers must be aware that while the feature improves latency, it can bring a tradeoff in terms of the safety and real-time vetting of smaller sections of model output. Because content filters are run asynchronously, content moderation messages and the content filtering signal in case of a policy violation are delayed, which means some sections of harmful content that would otherwise have been filtered immediately may be provided.  

 

For this reason, to ensure responsible use, this feature requires an application and approval. It is an exclusive feature for approved, trusted customers whose use case aligns with predefined categories like summarization, data reasoning, search, Q&A, code generation, writing assistance, chat, and conversation. 

 

 

Getting started 

For more information on these features and how to get started, visit the Azure OpenAI Service documentation. Azure OpenAI Service - Documentation, quickstarts, API reference - Azure AI services | Microsoft Lea...

1 Comment
Version history
Last update:
‎Nov 15 2023 10:04 AM
Updated by: