azure ai search
118 TopicsSimplify Search Development with the New Azure AI Search Wizard
Azure AI Search has introduced the new “Import Data” wizard—a unified, modernized experience that streamlines index creation across keyword, RAG, and multimodal RAG workflows. By merging the legacy keyword search wizard with the vectorization flow used for advanced AI scenarios, this update simplifies how users connect to data sources, configure skillsets, and build query-ready indexes. The new wizard supports semantic ranking, integrated vectorization, and multimodal enrichment, with expanded connector options like Azure Queues, OneDrive for Business, and SharePoint Online via Logic Apps. During the phased rollout, both the classic and new wizards will coexist, but users are encouraged to switch early to take advantage of enhanced capabilities and prepare for the eventual retirement of the legacy experience. Whether you're building traditional search or intelligent retrieval systems, the new wizard offers a faster, more intuitive path to production-ready indexes.316Views0likes0CommentsThe Future of AI: From Noise to Insight - An AI Agent for Customer Feedback
This post explores how Microsoft’s AI Futures team built a multi-agent system to transform scattered customer feedback into actionable insights. The solution aggregates feedback from multiple channels, uses advanced language models to cluster themes, summarize content, and identify sentiment, and delivers prioritized insights directly in Microsoft Teams. With human-in-the-loop safeguards, the system accelerates triage, prioritization, and follow-ups while maintaining compliance and traceability. Future enhancements include richer automation, trend visualization, and expanded feedback sources.258Views0likes0CommentsInteractive AI Avatars: Building Voice Agents with Azure Voice Live API
Azure Voice Live API recently reached General Availability, marking a significant milestone in conversational AI technology. This unified API surface doesn't just enable speech-to-speech capabilities for AI agents—it revolutionizes the entire experience by streaming interactions through lifelike avatars. Built on the powerful speech-to-speech capabilities of the GPT-4 Realtime model, Azure Voice Live API offers developers unprecedented flexibility: - Out-of-the-box or custom avatars from Azure AI Services - Wide range of neural voices, including Indic languages like the one featured in this demo - Single API interface that handles both audio processing and avatar streaming - Real-time responsiveness with sub-second latency In this post, I'll walk you through building a retail e-commerce voice agent that demonstrates this technology. While this implementation focuses on retail apparel, the architecture is entirely generic and can be adapted to any domain—healthcare, banking, education, or customer support—by simply changing the system prompt and implementing domain-specific tools integration. The Challenge: Navigating Uncharted Territory At the time of writing, documentation for implementing avatar features with Azure Voice Live API is minimal. The protocol-specific intricacies around avatar video streaming and the complex sequence of steps required to establish a live avatar connection were quite overwhelming. This is where Agent mode in GitHub Copilot in Visual Studio Code proved extremely useful. Through iterative conversations with the AI agent, I successfully discovered the approach to implement avatar streaming without getting lost in low-level protocol details. Here's how different AI models contributed to this solution: - Claude Sonnet 4.5: Rapidly architected the application structure, designing the hybrid WebSocket + WebRTC architecture with TypeScript/Vite frontend and FastAPI backend - GPT-5-Codex (Preview): Instrumental in implementing the complex avatar streaming components, handling WebRTC peer connections, and managing the bidirectional audio flow Architecture Overview: A Hybrid Approach The architecture comprises of these components 🐳 Container Application Architecture Vite Server: Node.js-based development server that serves the React application. In development, it provides hot module replacement and proxies API calls to `FastAPI`. In production, the React app is built into static files served by FastAPI. FastAPI with ASGI: Python web framework running on `uvicorn ASGI server`. ASGI (Asynchronous Server Gateway Interface) enables handling multiple concurrent connections efficiently, crucial for WebSocket connections and real-time audio processing. 🤖 AI & Voice Services Integration Azure Voice Live API: Primary service that manages the connection to GPT-4 Realtime Model, provides avatar video generation, neural text-to-speech, and WebSocket gateway functionality GPT-4 Realtime Model: Accessed through Azure Voice Live API for real-time audio processing, function calling, and intelligent conversation management 🔄 Communication Flows Audio Flow: Browser → WebSocket → FastAPI → WebSocket → Azure Voice Live API → GPT-4 Realtime Model Video Flow: Browser ↔ WebRTC Direct Connection ↔ Azure Voice Live API (bypasses backend for performance) Function Calls: GPT-4 Realtime (via Voice Live) → FastAPI Tools → Business APIs → Response → GPT-4 Realtime (via Voice Live) 🤖 Business process automation Workflows / RAG Shipment Logic App Agent: Analyzes orders, validates data, creates shipping labels, and updates tracking information Conversation Analysis Agent: Azure Logic App Reviews complete conversations, performs sentiment analysis, generates quality scores with justification, and stores insights for continuous improvement Knowledge Retrieval: Azure AI Search is used to reason over manuals and help respond to Customer queries on policies, products The solution implements a hybrid architecture that leverages both WebSocket proxying and direct WebRTC connections for optimal performance. This design ensures the conversational audio flow remains manageable and secure through the backend, while the bandwidth-intensive avatar video streams directly to the browser for optimal performance. The flow used in the Avatar communication: ``` Frontend FastAPI Backend Azure Voice Live API │ │ │ │ 1. Request Session │ │ │─────────────────────────►│ │ │ │ 2. Create Session │ │ │─────────────────────────►│ │ │ │ │ │ 3. Session Config │ │ │ (with avatar settings)│ │ │─────────────────────────►│ │ │ │ │ │ 4. session.updated │ │ │ (ICE servers) │ │ 5. ICE servers │◄─────────────────────────│ │◄─────────────────────────│ │ │ │ │ │ 6. Click "Start Avatar" │ │ │ │ │ │ 7. Create RTCPeerConn │ │ │ with ICE servers │ │ │ │ │ │ 8. Generate SDP Offer │ │ │ │ │ │ 9. POST /avatar-offer │ │ │─────────────────────────►│ │ │ │ 10. Encode & Send SDP │ │ │─────────────────────────►│ │ │ │ │ │ 11. session.avatar. │ │ │ connecting │ │ │ (SDP answer) │ │ 12. SDP Answer │◄─────────────────────────│ │◄─────────────────────────│ │ │ │ │ │ 13. setRemoteDescription │ │ │ │ │ │ 14. WebRTC Handshake │ │ │◄─────────────────────────┼─────────────────────────►│ │ (Direct Connection) │ │ │ │ │ │ 15. Video/Audio Stream │ │ │◄────────────────────────────────────────────────────│ │ (Bypasses Backend) │ │ ``` For more technical details, refer to the technical details behind the implementation, refer to the GitHub Repo shared in this post. Here is a video of the demo of the application in action.455Views2likes0CommentsUsing the Voice Live API in Azure AI Foundry
In this blog post, we’ll explore the Voice Live API from Azure AI Foundry. Officially released for general availability on October 1, 2025, this API unifies speech recognition, generative AI, and text-to-speech capabilities into a single, streamlined interface. It removes the complexity of manually orchestrating multiple components and ensures a consistent developer experience across all models, making it easy to switch and experiment. What sets Voice Live API apart are its advanced conversational enhancements, including: Semantic Voice Activity Detection (VAD) that’s robust against background noise and accurately detects when a user intends to speak. Semantic end-of-turn detection that supports natural pauses in conversation. Server-side audio processing features like noise suppression and echo cancellation, simplifying client-side development. Let’s get started. 1. Getting Started with Voice Live API The Voice Live API ships with an SDKthat lets you open a single realtime WebSocket connection and then do everything—stream microphone audio up, receive synthesized audio/text/function‑call events down— without writing any of the low-level networking plumbing. This is how the connection is opened with the Python SDK. from azure.ai.voicelive.aio import connect from azure.core.credentials import AzureKeyCredential async with connect( endpoint=VOICE_LIVE_ENDPOINT, # https://<your-foundry-resource>.cognitiveservices.azure.com/ credential=AzureKeyCredential(VOICE_LIVE_KEY), model="gpt-4o-realtime", connection_options={ "max_msg_size": 10 * 1024 * 1024, # allow streamed PCM "heartbeat": 20, # keep socket alive "timeout": 20, # network resilience }, ) as connection: Notice that you don't need an underlying deployment nor manage any generative AI models, as the API handles all the underlying infrastructure. Immediately after connecting, declare what kind of conversation you want. This is where you “teach” the session the model instructions, which voice to synthesize, what tool functions it may call, and how to detect speech turns: from azure.ai.voicelive.models import ( RequestSession, Modality, AzureStandardVoice, InputAudioFormat, OutputAudioFormat, AzureSemanticVad, ToolChoiceLiteral, AudioInputTranscriptionOptions ) session_config = RequestSession( modalities=[Modality.TEXT, Modality.AUDIO], instructions="Assist the user with account questions succinctly.", voice=AzureStandardVoice(name="alloy", type="azure-standard"), input_audio_format=InputAudioFormat.PCM16, output_audio_format=OutputAudioFormat.PCM16, turn_detection=AzureSemanticVad( threshold=0.5, prefix_padding_ms=300, silence_duration_ms=500 ), tools=[ # optional { "name": "get_user_information", "description": "Retrieve profile and limits for a user", "input_schema": { "type": "object", "properties": {"user_id": {"type": "string"}}, "required": ["user_id"] } } ], tool_choice=ToolChoiceLiteral.AUTO, input_audio_transcription=AudioInputTranscriptionOptions(model="whisper-1"), ) await connection.session.update(session=session_config) After session setup, it is pure event-driven flow: async for event in connection: if event.type == ServerEventType.RESPONSE_AUDIO_DELTA: playback_queue.put(event.delta) elif event.type == ServerEventType.CONVERSATION_ITEM_CREATED and event.item.type == ItemType.FUNCTION_CALL: handle_function_call(event) That’s the core: one connection, one session config message, then pure event-driven flow. 2. Deep Dive: Tool (Function) Handling in the Voice Live SDK In the Voice Live context, “tools” are model-invocable functions you expose with a JSON schema. The SDK streams a structured function call request (name + incrementally streamed arguments), you execute real code locally, then feed the JSON result back so the model can incorporate it into its next spoken (and/or textual) turn. Let’s unpack the full lifecycle. First, the model emits a CONVERSATION_ITEM_CREATED event whose item.type == FUNCTION_CALL if event.item.type == ItemType.FUNCTION_CALL: await self._handle_function_call_with_improved_pattern(event, connection) Arguments stream (possibly token-by-token) until the SDK signals RESPONSE_FUNCTION_CALL_ARGUMENTS_DONE. Optionally, the SDK may also complete the “response” segment with RESPONSE_DONE before you run the tool. Then we execute the local Python function, and explicitly request a new model response via connection.response.create(), telling the model to incorporate the tool result into a natural-language (and audio) answer. async def _handle_function_call(self, created_evt, connection): call_item = created_evt.item # ResponseFunctionCallItem name = call_item.name call_id = call_item.call_id prev_id = call_item.id # 1. Wait until arguments are fully streamed args_done = await _wait_for_event( connection, {ServerEventType.RESPONSE_FUNCTION_CALL_ARGUMENTS_DONE} ) assert args_done.call_id == call_id arguments = args_done.arguments # JSON string # 2. (Optional) Wait for RESPONSE_DONE to avoid race with model finishing segment await _wait_for_event(connection, {ServerEventType.RESPONSE_DONE}) # 3. Execute func = self.available_functions.get(name) if not func: # Optionally send an error function output return result = await func(arguments) # Implementations are async in this sample # 4. Send output output_item = FunctionCallOutputItem(call_id=call_id, output=json.dumps(result)) await connection.conversation.item.create( previous_item_id=prev_id, item=output_item ) # 5. Trigger follow-up model response await connection.response.create() 3. Sample App: Try the repo with sample app we have created, together with all the infrastructure required already automated. This sample app simulates a friendly real‑time contact‑center rep who can listen continuously, understand you as you speak, instantly look up things like your credit card’s upcoming due date or a product detail via function calls, and then answer back naturally in a Brazilian Portuguese neural voice with almost no lag. Behind the scenes it streams your microphone audio to Azure’s Voice Live (GPT‑4o realtime) model, transcribes and reasons on the fly, selectively triggers lightweight “get user information” or “get product information” lookups to Azure AI Search , and speaks responses right back to you. Happy Coding!378Views0likes0CommentsThe Future of AI: The paradigm shifts in Generative AI Operations
Dive into the transformative world of Generative AI Operations (GenAIOps) with Microsoft Azure. Discover how businesses are overcoming the challenges of deploying and scaling generative AI applications. Learn about the innovative tools and services Azure AI offers, and how they empower developers to create high-quality, scalable AI solutions. Explore the paradigm shift from MLOps to GenAIOps and see how continuous improvement practices ensure your AI applications remain cutting-edge. Join us on this journey to harness the full potential of generative AI and drive operational excellence.7.3KViews1like1CommentThe Future Of AI: Deconstructing Contoso Chat - Learning GenAIOps in practice
How can AI engineers build applied knowledge for GenAIOps practices? By deconstructing working samples! In this multi-part series, we deconstruct Contoso Chat (a RAG-based retail copilot sample) and use it to learn the tools and workflows to streamline out end-to-end developer journey using Azure AI Foundry.905Views0likes0CommentsThe Future of AI: Harnessing AI for E-commerce - personalized shopping agents
Explore the development of personalized shopping agents that enhance user experience by providing tailored product recommendations based on uploaded images. Leveraging Azure AI Foundry, these agents analyze images for apparel recognition and generate intelligent product recommendations, creating a seamless and intuitive shopping experience for retail customers.1.3KViews5likes3CommentsThe Future of AI: Reduce AI Provisioning Effort - Jumpstart your solutions with AI App Templates
In the previous post, we introduced Contoso Chat – an open-source RAG-based retail chat sample for Azure AI Foundry, that serves as both an AI App template (for builders) and the basis for a hands-on workshop (for learners). And we briefly talked about five stages in the developer workflow (provision, setup, ideate, evaluate, deploy) that take them from the initial prompt to a deployed product. But how can that sample help you build your app? The answer lies in developer tools and AI App templates that jumpstart productivity by giving you a fast start and a solid foundation to build on. In this post, we answer that question with a closer look at Azure AI App templates - what they are, and how we can jumpstart our productivity with a reuse-and-extend approach that builds on open-source samples for core application architectures.483Views0likes0CommentsThe Future of AI: Autonomous Agents for Identifying the Root Cause of Cloud Service Incidents
Discover how Microsoft is transforming cloud service incident management with autonomous AI agents. Learn how AI-enhanced troubleshooting guides and agentic workflows are reducing downtime and empowering on-call engineers.2.7KViews3likes1Comment