We’re thrilled to introduce a new accelerator solution in GitHub Azure-Samples library designed specifically for creating and enhancing your GenAI-based conversational assistants with robust, human-controllable workflows. This accelerator uses key services from Azure AI Language in addition to Azure OpenAI, including PII detection to protect sensitive information, Conversational Language Understanding (CLU) to predict top users’ intents, Custom Question Answering (CQA) to respond to top questions with deterministic answers. Together with Azure OpenAI and Large Language Models (LLMs), the solution is designed to orchestrate and deliver a smooth, human-guided, controllable and deterministic conversational experience. The integration with LLMs will come soon. It’s perfect for developers and organizations looking to build assistants that can handle complex queries, route tasks, and provide reliable answers, all with a controlled, scalable architecture.
Why This Accelerator
While LLMs have been appreciated by many customers to build conversational assistants for natural, engaging, and context-aware interactions, there are challenges such as the significant efforts required in prompt engineering, document chunking, and reducing hallucinations to improve the quality of their Retrieval-Augmented Generation (RAG) solutions. If an AI quality issue is discovered in production, customers need to find an effective way to address it promptly. This solution aims to help customers utilize offerings in the Azure AI portfolio and address key challenges when building Generative AI (GenAI) assistants.
Designed for flexibility and reliability, this accelerator enables human-controllable workflows that meet real-world customer needs. It minimizes the need for extensive prompt engineering by using a structured workflow to prioritize top questions with exact answers and custom intents that are critical to your business and use LLM to handle topics in a conversation that have lower priorities. This architecture not only enhances answer quality and control but also ensures that complex queries are handled efficiently.
If you want to fix quickly an incorrect answer for your chatbot built with RAG, you can also attach this accelerator solution to your existing RAG solution and quickly add a QA pair with the correct response in CQA to fix the issue for your users.
What This Accelerator Delivers
This accelerator provides and demonstrates an end-to-end orchestration using a few capabilities in Azure AI Language and Azure OpenAI for conversational assistants. It can be applied in various scenarios where control over assistant behavior and response quality is essential, like call centers, help desks, and other customer support applications.
Below is a reference architecture of the solutions:
Key components of this solution include (components in dash boxes coming soon):
- Client-Side User Interface for Demonstration (coming soon)
A web-based client-side interface is included in the accelerator solution, to showcase the accelerator solution in an interactive, user-friendly format. This web UI allows you to quickly explore and test this solution, such as its orchestration routing behavior and functionalities.
- Workflow Orchestration for Human-Controllable Conversations
By combining services like CLU, CQA, and LLMs, the accelerator allows for a dynamic, adaptable workflow. CLU can recognize and route customer-defined intents, while CQA provides exact answers from predefined QA pairs. If a question falls outside the pre-defined scope, the workflow can seamlessly fall back to LLMs, which is enhanced with RAG for contextually relevant, accurate responses. This workflow ensures human-like adaptability while maintaining control over assistant responses. - Conversational Language Understanding (CLU) for Intent Routing
The CLU service allows you to define the top intents you want the assistants to handle. The top intents can be those critical to your business and/or those most users ask your assistants. This component plays a central role in directing conversations by interpreting user intents and routing them to the right action or AI agents. Whether completing a task or addressing specific customer needs, CLU provides the mechanism to ensure the assistant accurately understands and executes the process of handling custom-defined intents. - Custom Question Answering (CQA) for Exact Answers and with No Hallucinations
CQA allows you to create and manage predefined QA pairs to deliver precise responses, reducing ambiguity and ensuring that the assistant aligns closely with defined answers. This controlled response approach maintains consistency in interactions, improving reliability, particularly for high-stake or regulatory-sensitive conversations. You can also attach CQA to your existing RAG solution to quickly fix incorrect answers. - PII Detection and Redaction for Privacy Protection (coming soon)
Protecting user privacy is a top priority, especially in conversational AI. This accelerator showcases an optional integration of Azure AI Language’s Personally Identifiable Information (PII) to automatically identify and redact sensitive information, if compliance with privacy standards and regulations is required
- LLM with RAG to Handle Everything Else (coming soon)
In this accelerator, we are using a RAG solution to handle missed intents or user queries on lower-priority topics. This RAG solution can be replaced with your existing one. The predefined intents and question-answer pairs can be appended and updated over time based on evolving business needs and DSATs (dissatisfaction) discovered in the RAG responses. This approach ensures controlled and deterministic experiences for high-value or high-priority topics while maintaining flexibility and extensibility for lower-priority interactions. - Components Configuration for "Plug-and-Play"
One of the standout features of this accelerator is its flexibility through a "plug-and-play" component configuration. The architecture is designed to allow you to easily swap, add, or remove components to tailor the solution to your specific needs. Whether you want to add custom intents, adjust fallback mechanisms, or incorporate additional data sources, the modular nature of the accelerator makes it simple to configure.
Get Started Building Your GenAI-Powered Assistant Today
Our new accelerator is available on GitHub, ready for developers to deploy, customize, and use as a foundation for your own needs.
Join us as we move towards a future where GenAI can empower organizations to meet business needs with intelligent, adaptable, and human-controllable assistants.
What’s more: Other New Azure AI Language Releases This Ignite
Beyond these, Azure AI Language provides additional capabilities to support GenAI customers in more scenarios to ensure quality, privacy and flexible deployment in any types of environments, either clouds or on premises. We are also excited to announce the following new features launching at Ignite.
- Azure AI Language in Azure AI Studio:
Azure AI Language is moving to AI Studio. Extract PII from text, Extract PII from conversation, Summarize text, Summarize conversation, Summarize for call center, and Text Analytics for health are now available in AI Studio playground. More skills follow.
- Conversational Language Understanding (CLU):
Today, customers use CLU to build custom natural language understanding models hosted by Azure to predict the overall intention of an incoming utterance and extract important information from it. However, some customers have specific needs that require an on-premise connection. We are excited to announce runtime containers for CLU for these specific use cases.
- PII Detection and Redaction:
Azure AI Language offers Text PII and Conversational PII services to extract personally identifiable information from input text and conversation to enhance privacy and security, oftentimes before sending data to the cloud or an LLM. We are excited to announce new improvements to these services - the preview API (version 2024-11-15-preview) now supports the option to mask detected sensitive entities with a label (i.e. “John Doe received a call from 424-878-9192” can now be masked with an entity label, i.e. . “[PERSON_1] received a call from [PHONENUMBER_1]”. More on how to specify the redaction policy style for your outputs can be found in our documentation.
- Native document support:
The gating control is removed with the latest API version, 2024-11-15-preview, allowing customers to access native document support for PII redaction and summarization. Key updates in this version include:
- Increased Maximum File Size Limits (from 1 MB to 10 MB).
- Enhanced PII Redaction Customization: Customers can now specify whether they want only the redacted document or both the redacted document and a JSON file containing the detected entities.
- Language detection:
Language detection is a preconfigured feature that can detect the language a document is written in and returns a language code for a wide range of languages, variants, dialects, and some regional/cultural languages. We are happy to announce today the general availability of scription detection capability, and 16 more languages support, which adds up to 139 total supported languages.
- Named entity recognition (NER):
The Named Entity Recognition (NER) service supports customer scenarios for identifying and analyzing entities such as addresses, names, and phone numbers from inputs text. NER’s Generally Available API (version 2024-11-01) now supports several optional input parameters (inclusionList, exclusionList, inferenceOptions, and overlapPolicy) as well as an updated output structure (with new fields tags, type, and metadata) to ensure enhanced user customization and deeper analysis. More on how to use these parameters can be found in our documentation.
- Text analytics for health:
Text analytics for health (TA4H) is a preconfigured feature that extracts and labels relevant medical information from unstructured texts such as doctor's notes, discharge summaries, clinical documents, and electronic health records. Today, we released support for Fast Healthcare Interoperability Resources (FHIR) structuring and temporal assertion detection in the Generally Available API.