GitHub
328 TopicsInteractive AI Avatars: Building Voice Agents with Azure Voice Live API
Azure Voice Live API recently reached General Availability, marking a significant milestone in conversational AI technology. This unified API surface doesn't just enable speech-to-speech capabilities for AI agentsโit revolutionizes the entire experience by streaming interactions through lifelike avatars. Built on the powerful speech-to-speech capabilities of the GPT-4 Realtime model, Azure Voice Live API offers developers unprecedented flexibility: - Out-of-the-box or custom avatars from Azure AI Services - Wide range of neural voices, including Indic languages like the one featured in this demo - Single API interface that handles both audio processing and avatar streaming - Real-time responsiveness with sub-second latency In this post, I'll walk you through building a retail e-commerce voice agent that demonstrates this technology. While this implementation focuses on retail apparel, the architecture is entirely generic and can be adapted to any domainโhealthcare, banking, education, or customer supportโby simply changing the system prompt and implementing domain-specific tools integration. The Challenge: Navigating Uncharted Territory At the time of writing, documentation for implementing avatar features with Azure Voice Live API is minimal. The protocol-specific intricacies around avatar video streaming and the complex sequence of steps required to establish a live avatar connection were quite overwhelming. This is where Agent mode in GitHub Copilot in Visual Studio Code proved extremely useful. Through iterative conversations with the AI agent, I successfully discovered the approach to implement avatar streaming without getting lost in low-level protocol details. Here's how different AI models contributed to this solution: - Claude Sonnet 4.5: Rapidly architected the application structure, designing the hybrid WebSocket + WebRTC architecture with TypeScript/Vite frontend and FastAPI backend - GPT-5-Codex (Preview): Instrumental in implementing the complex avatar streaming components, handling WebRTC peer connections, and managing the bidirectional audio flow Architecture Overview: A Hybrid Approach The architecture comprises of these components ๐ณ Container Application Architecture Vite Server: Node.js-based development server that serves the React application. In development, it provides hot module replacement and proxies API calls to `FastAPI`. In production, the React app is built into static files served by FastAPI. FastAPI with ASGI: Python web framework running on `uvicorn ASGI server`. ASGI (Asynchronous Server Gateway Interface) enables handling multiple concurrent connections efficiently, crucial for WebSocket connections and real-time audio processing. ๐ค AI & Voice Services Integration Azure Voice Live API: Primary service that manages the connection to GPT-4 Realtime Model, provides avatar video generation, neural text-to-speech, and WebSocket gateway functionality GPT-4 Realtime Model: Accessed through Azure Voice Live API for real-time audio processing, function calling, and intelligent conversation management ๐ Communication Flows Audio Flow: Browser โ WebSocket โ FastAPI โ WebSocket โ Azure Voice Live API โ GPT-4 Realtime Model Video Flow: Browser โ WebRTC Direct Connection โ Azure Voice Live API (bypasses backend for performance) Function Calls: GPT-4 Realtime (via Voice Live) โ FastAPI Tools โ Business APIs โ Response โ GPT-4 Realtime (via Voice Live) ๐ค Business process automation Workflows / RAG Shipment Logic App Agent: Analyzes orders, validates data, creates shipping labels, and updates tracking information Conversation Analysis Agent: Azure Logic App Reviews complete conversations, performs sentiment analysis, generates quality scores with justification, and stores insights for continuous improvement Knowledge Retrieval: Azure AI Search is used to reason over manuals and help respond to Customer queries on policies, products The solution implements a hybrid architecture that leverages both WebSocket proxying and direct WebRTC connections for optimal performance. This design ensures the conversational audio flow remains manageable and secure through the backend, while the bandwidth-intensive avatar video streams directly to the browser for optimal performance. The flow used in the Avatar communication: ``` Frontend FastAPI Backend Azure Voice Live API โ โ โ โ 1. Request Session โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโบโ โ โ โ 2. Create Session โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโบโ โ โ โ โ โ 3. Session Config โ โ โ (with avatar settings)โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโบโ โ โ โ โ โ 4. session.updated โ โ โ (ICE servers) โ โ 5. ICE servers โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โ โ โ 6. Click "Start Avatar" โ โ โ โ โ โ 7. Create RTCPeerConn โ โ โ with ICE servers โ โ โ โ โ โ 8. Generate SDP Offer โ โ โ โ โ โ 9. POST /avatar-offer โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโบโ โ โ โ 10. Encode & Send SDP โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโบโ โ โ โ โ โ 11. session.avatar. โ โ โ connecting โ โ โ (SDP answer) โ โ 12. SDP Answer โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โ โ โ 13. setRemoteDescription โ โ โ โ โ โ 14. WebRTC Handshake โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโบโ โ (Direct Connection) โ โ โ โ โ โ 15. Video/Audio Stream โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ (Bypasses Backend) โ โ ``` For more technical details, refer to the technical details behind the implementation, refer to the GitHub Repo shared in this post. Here is a video of the demo of the application in action.81Views0likes0CommentsCI/CD GitHub Deployment from Dev to UAT Synapse Workspace not Picking Up UAT Resources
Hello, I am setting up CI/CD for Azure Synapse Analytics using GitHub Actions with multiple environments (Dev, UAT, Prod). My Synapse resources are: Dev: ************-dev, azcalsbdatalakedev, calsbvaultdev, SQL DB azcalsbazuresqldev / MetaData UAT: ************-uat, azcalsbdatalakeuat, calsbvaultuat, SQL DB azcalsbazuresqluat / MetaData Prod: ***********-prod, azcalsbdatalakeprod, azcalsbvaultprod, SQL DB azcalsbazuresqlprod / MetaData I have environment-specific parameter override files like uat.json and prod.json. My GitHub workflows (synapse-dev.yml, synapse-uat.yml, etc.) deploy the Synapse publish artifacts (TemplateForWorkspace.json and TemplateParametersForWorkspace.json) with those overrides. Issue: When I run the UAT workflow, deployment completes successfully but the UAT Synapse workspace still shows Dev resources. For example, linked services like LS_ADLS still point to azcalsbdatalakedev instead of azcalsbdatalakeuat. What I have tried: Created overrides for UAT (uat.json) with correct workspace name and connection strings Checked GitHub workflow YAML to confirm the override file is being passed in the az deployment group create step Verified that Dev deployment works fine Tried changing default values in linked services JSON but behavior is inconsistent Questions: Is there a specific way to structure override files (uat.json) for Synapse CI/CD deployments so environment values are correctly replaced? Do I need separate branches in GitHub for Dev, UAT, and Prod, or can I deploy to all environments from main with overrides? Has anyone else seen linked services or parameters still pointing to Dev even after a UAT deployment? Any guidance, best practices, or sample YAML and override examples would be very helpful. Thanks in advance.54Views0likes1CommentUnlock the Power of AI with GitHub Models: A Hands-On Guide
Ready to elevate your coding game? Imagine having the power of advanced AI at your fingertips, ready to integrate into your projects with just a few clicks. Whether you're building a smart assistant, automating workflows, or creating the next big thing, GitHub Models are here to make it happen. Dive into our guide and discover how to get started, customize responses, and even build your own AI-powered applicationsโall from within the familiar GitHub interface. Your journey into the world of AI starts now. Click to explore and let your creativity take flight!3.9KViews1like1CommentStudy Buddy: Learning Data Science and Machine Learning with an AI Sidekick
If you've ever wished for a friendly companion to guide you through the world of data science and machine learning, you're not alone. As part of the "For Beginners" curriculum, I recently built a Study Buddy Agent, an AI-powered assistant designed to help learners explore data science interactively, intuitively, and joyfully. Why a Study Buddy? Learning something new can be overwhelming, especially when you're navigating complex topics like machine learning, statistics, or Python programming. The Study Buddy Agent is here to change that. It brings the curriculum to life by answering questions, offering explanations, and nudging learners toward deeper understanding, all in a conversational format. Think of it as your AI-powered lab partner: always available, never judgmental, and endlessly curious. Built with chatmodes, Powered by Purpose The agent lives inside a .chatmodes file in the https://github.com/microsoft/Data-Science-For-Beginners/blob/main/.github/chatmodes/study-mode.chatmode.md. This file defines how the agent behaves, what tone it uses, and how it interacts with learners. I designed it to be friendly, encouraging, and beginner-firstโjust like the curriculum itself. Itโs not just about answering questions. The Study Buddy is trained to: Reinforce key concepts from the curriculum Offer hints and nudges when learners get stuck Encourage exploration and experimentation Celebrate progress and milestones Whatโs Under the Hood? The agent uses GitHub Copilot's chatmode, which allows developers to define custom behaviors for AI agents. By aligning the agentโs responses with the curriculumโs learning objectives, we ensure that learners stay on track while enjoying the flexibility of conversational learning. How You Can Use It YouTube Video here: Study Buddy - Data Science AI Sidekick Clone the repo: Head to the https://github.com/microsoft/Data-Science-For-Beginners and clone it locally or use Codespaces. Open the GitHub Copilot Chat, and select Study Buddy: This will activate the Study Buddy. Start chatting: Ask questions, explore topics, and let the agent guide you. Whatโs Next? This is just the beginning. Iโm exploring ways to: Expand the agent to other beginner curriculums (Web Dev, AI, IoT) Integrate feedback loops so learners can shape the agentโs evolution Final Thoughts In my role, I believe learning should be inclusive, empowering, and fun. The Study Buddy Agent is a small step toward that vision, a way to make data science feel less like a mountain and more like a hike with a good friend. Try it out, share your feedback, and letโs keep building tools that make learning magical. Join us on Discord to share your feedback.Safeguard data on third-party collaboration platforms
I am exploring options to safeguard sensitive data in third-party collaboration platforms like GitHub and Confluence. Does Microsoft Purview provide any native integration for these platforms? Do I need to rely on third-party connectors/integrations to extend Purviewโs capabilities into these environments?100Views0likes2CommentsIntroducing the Microsoft Agent Framework
Introducing the Microsoft Agent Framework: A Unified Foundation for AI Agents and Workflows The landscape of AI development is evolving rapidly, and Microsoft is at the forefront with the release of the Microsoft Agent Framework an open-source SDK designed to empower developers to build intelligent, multi-agent systems with ease and precision. Whether you're working in .NET or Python, this framework offers a unified, extensible foundation that merges the best of Semantic Kernel and AutoGen, while introducing powerful new capabilities for agent orchestration and workflow design. Introducing Microsoft Agent Framework: The Open-Source Engine for Agentic AI Apps | Azure AI Foundry Blog Introducing Microsoft Agent Framework | Microsoft Azure Blog Why Another Agent Framework? Both Semantic Kernel and AutoGen have pioneered agentic development, Semantic Kernel with its enterprise-grade features and AutoGen with its research-driven abstractions. The Microsoft Agent Framework is the next generation of both, built by the same teams to unify their strengths: AutoGenโs simplicity in multi-agent orchestration. Semantic Kernelโs robustness in thread-based state management, telemetry, and type safety. New capabilities like graph-based workflows, checkpointing, and human-in-the-loop support This convergence means developers no longer have to choose between experimentation and production. The Agent Framework is designed to scale from single-agent prototypes to complex, enterprise-ready systems Core Capabilities AI Agents AI agents are autonomous entities powered by LLMs that can process user inputs, make decisions, call tools and MCP servers, and generate responses. They support providers like Azure OpenAI, OpenAI, and Azure AI, and can be enhanced with: Agent threads for state management. Context providers for memory. Middleware for action interception. MCP clients for tool integration Use cases include customer support, education, code generation, research assistance, and moreโespecially where tasks are dynamic and underspecified. Workflows Workflows are graph-based orchestrations that connect multiple agents and functions to perform complex, multi-step tasks. They support: Type-based routing Conditional logic Checkpointing Human-in-the-loop interactions Multi-agent orchestration patterns (sequential, concurrent, hand-off, Magentic) Workflows are ideal for structured, long-running processes that require reliability and modularity. Developer Experience The Agent Framework is designed to be intuitive and powerful: Installation: Python: pip install agent-framework .NET: dotnet add package Microsoft.Agents.AI Integration: Works with Foundry SDK, MCP SDK, A2A SDK, and M365 Copilot Agents Samples and Manifests: Explore declarative agent manifests and code samples Learning Resources: Microsoft Learn modules AI Agents for Beginners AI Show demos Azure AI Foundry Discord community Migration and Compatibility If you're currently using Semantic Kernel or AutoGen, migration guides are available to help you transition smoothly. The framework is designed to be backward-compatible where possible, and future updates will continue to support community contributions via the GitHub repository. Important Considerations The Agent Framework is in public preview. Feedback and issues are welcome on the GitHub repository. When integrating with third-party servers or agents, review data sharing practices and compliance boundaries carefully. The Microsoft Agent Framework marks a pivotal moment in AI development, bringing together research innovation and enterprise readiness into a single, open-source foundation. Whether you're building your first agent or orchestrating a fleet of them, this framework gives you the tools to do it safely, scalably, and intelligently. Ready to get started? Download the SDK, explore the documentation, and join the community shaping the future of AI agents.From Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, weโve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint itโs now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isnโt just about running models locally, itโs about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that donโt break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What Youโll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration ๐ 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview โข Industry Applications โข SLM Introduction โข Learning Objectives Beginner 1-2 hrs ๐ 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals โข Real World Case Studies โข Implementation Guide โข Edge Deployment Beginner 3-4 hrs ๐ง 02 SLM Model Foundations Model families & architecture Phi Family โข Qwen Family โข Gemma Family โข BitNET โข ฮผModel โข Phi-Silica Beginner 4-5 hrs ๐ 03 SLM Deployment Practice Local & cloud deployment Advanced Learning โข Local Environment โข Cloud Deployment Intermediate 4-5 hrs โ๏ธ 04 Model Optimization Toolkit Cross-platform optimization Introduction โข Llama.cpp โข Microsoft Olive โข OpenVINO โข Apple MLX โข Workflow Synthesis Intermediate 5-6 hrs ๐ง 05 SLMOps Production Production operations SLMOps Introduction โข Model Distillation โข Fine-tuning โข Production Deployment Advanced 5-6 hrs ๐ค 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction โข Function Calling โข Model Context Protocol Advanced 4-5 hrs ๐ป 07 Platform Implementation Cross-platform samples AI Toolkit โข Foundry Local โข Windows Development Advanced 3-4 hrs ๐ญ 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - ๐ง Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - ๐งฉ ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - ๐ฎ DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - ๐ฅ๏ธ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isnโt just about inference, itโs about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.Create an Active Student badge on Microsoft Learn
Create an Active Student badge on Microsoft Learn Description: I suggest adding an official Active Studentโ badge in the Microsoft Community and Microsoft Learn platforms. This badge would Highlight studentsโ commitment to learning. Encourage continuous participation through visible recognition. Connect learning achievements (Learn) with community contributions (Community Hub). Provide a public credential that can be showcased on a CV or professional profile. Such a symbolic addition would strengthen motivation, visibility, and the bridge between Microsoft Learn and the Community.Solved258Views2likes5CommentsThe Future of AI: The paradigm shifts in Generative AI Operations
Dive into the transformative world of Generative AI Operations (GenAIOps) with Microsoft Azure. Discover how businesses are overcoming the challenges of deploying and scaling generative AI applications. Learn about the innovative tools and services Azure AI offers, and how they empower developers to create high-quality, scalable AI solutions. Explore the paradigm shift from MLOps to GenAIOps and see how continuous improvement practices ensure your AI applications remain cutting-edge. Join us on this journey to harness the full potential of generative AI and drive operational excellence.7.3KViews1like1CommentThe Future of AI: Reduce AI Provisioning Effort - Jumpstart your solutions with AI App Templates
In the previous post, we introduced Contoso Chat โ an open-source RAG-based retail chat sample for Azure AI Foundry, that serves as both an AI App template (for builders) and the basis for a hands-on workshop (for learners). And we briefly talked about five stages in the developer workflow (provision, setup, ideate, evaluate, deploy) that take them from the initial prompt to a deployed product. But how can that sample help you build your app? The answer lies in developer tools and AI App templates that jumpstart productivity by giving you a fast start and a solid foundation to build on. In this post, we answer that question with a closer look at Azure AI App templates - what they are, and how we can jumpstart our productivity with a reuse-and-extend approach that builds on open-source samples for core application architectures.480Views0likes0Comments