developer
270 TopicsSimplifying Microservice Reliability with Dapr
What is Dapr? Dapr is an open-source runtime developed by Microsoft that is used in building resilient, event-driven, and portable applications. It works using the sidecar pattern, meaning every microservice gets a small companion container — the Dapr sidecar — which handles communication, retries, secrets, state, and more. What is Sidecar ? A sidecar is a helper process that runs beside your app, handling system tasks so your code can focus on business logic. Lets see some offerings from Dapr along with examples. #1 . Bindings Connects your app to external systems (like queues, email, or storage) with zero SDK or protocol handling. Without Dapr ❌ var httpClient = new HttpClient(); await httpClient.PostAsJsonAsync("https://api.sendgrid.com/send", email); * Manage HTTP endpoints & credentials * Change provider → rewrite logic With Dapr ✅ await daprClient.InvokeBindingAsync("send-email", "create", email); * One call, no SDK * Replace SendGrid → SMTP → Twilio just by editing config * No code change, no redeploy How to enable binding in for a Azure Container App Open Azure Portal → go to your Container App Environment. From the left pane, click on Container Apps, and choose your desired app (e.g., orders-api). In the Settings section, select Dapr. Enable Dapr toggle → switch it ON. Provide the basic Dapr settings: App ID: A unique name (e.g., orders-app). App Port: The internal port your API listens on (e.g., 8080). App Protocol: Choose HTTP or gRPC (usually HTTP). Click Save to apply. Now, under the same Container App Environment, go to Dapr Components. Click Create → select Binding → choose the type of binding (e.g., azure.storagequeues). #2 . Configuration Centralizes app settings, allowing live configuration updates without redeploying services. Without Dapr ❌ var featureFlag = Configuration["FeatureX"]; * Requires redeploys for every config change * No centralized versioning or dynamic update With Dapr ✅ var config = await daprClient.GetConfiguration("appconfigstore", new[] { "FeatureX" }); * Use Azure App Config, Consul, or any provider * Centralized updates — no redeploys * Consistent access via Dapr SDK How to enable configuration in for a Azure Container App Open Azure Portal → go to your Container App Environment. From the left pane, click on Container Apps, and choose your desired app (e.g., orders-api). In the Settings section, select Dapr. Enable Dapr toggle → switch it ON. Provide the basic Dapr settings: App ID: A unique name (e.g., orders-app). App Port: The internal port your API listens on (e.g., 8080). App Protocol: Choose HTTP or gRPC (usually HTTP). Click Save to apply. Now, under the same Container App Environment, go to Dapr Components. Click Create → select Configuration→ choose the type of configuration (e.g., configuration.azure.appconfig). #3 . Pub/Sub Enables event-driven communication between microservices without needing to know each other's endpoints. Without Dapr ❌ var client = new ServiceBusClient("<connection-string>"); var sender = client.CreateSender("order-topic"); await sender.SendMessageAsync(new ServiceBusMessage(orderJson)); * Tied to Azure Service Bus * Must manage SDKs, connections, retries * Hard to switch to another broker (Kafka, RabbitMQ) With Dapr ✅ await daprClient.PublishEventAsync("pubsub", "order-created", order); * pubsub component defined in YAML (can be Kafka, Redis Streams, etc.) * No SDK, no broker dependency * Just publish the event — Dapr handles transport & retries How to enable pub/sub in for a Azure Container App Open Azure Portal → go to your Container App Environment. From the left pane, click on Container Apps, and choose your desired app (e.g., orders-api). In the Settings section, select Dapr. Enable Dapr toggle → switch it ON. Provide the basic Dapr settings: App ID: A unique name (e.g., orders-app). App Port: The internal port your API listens on (e.g., 8080). App Protocol: Choose HTTP or gRPC (usually HTTP). Click Save to apply. Now, under the same Container App Environment, go to Dapr Components. Click Create → select Pub/Sub→ choose the type of configuration (e.g., pubsub.azure.servicebus.topics). #4 . Secret Stores Securely retrieves credentials and secrets from vaults, keeping them out of configs and code. Without Dapr ❌ var connString = Configuration["ConnectionStrings:DB"]; * Secrets stored in configs or env vars * Risk of leaks and manual rotation With Dapr ✅ var secret = await daprClient.GetSecretAsync("vault", "dbConnection"); * Fetch directly from Azure Key Vault, AWS Secrets, etc. * No secrets in configs * Secure by default, consistent across services How to enable Secret Stores in for a Azure Container App Open Azure Portal → go to your Container App Environment. From the left pane, click on Container Apps, and choose your desired app (e.g., orders-api). In the Settings section, select Dapr. Enable Dapr toggle → switch it ON. Provide the basic Dapr settings: App ID: A unique name (e.g., orders-app). App Port: The internal port your API listens on (e.g., 8080). App Protocol: Choose HTTP or gRPC (usually HTTP). Click Save to apply. Now, under the same Container App Environment, go to Dapr Components. Click Create → select Secret stores→ choose the type of configuration (e.g., secretstores.azure.keyvault). #5 . State Provides a consistent way to store and retrieve application data across services using a simple API. Without Dapr ❌ var cosmosClient = new CosmosClient(connStr); var container = cosmosClient.GetContainer("db", "state"); await container.UpsertItemAsync(order); * Direct dependency on Cosmos DB * Manual retry logic * Tight coupling to storage type With Dapr ✅ await daprClient.SaveStateAsync("statestore", "order-101", order); var data = await daprClient.GetStateAsync<Order>("statestore", "order-101"); * Plug any backend (Redis, Cosmos, PostgreSQL) * Dapr handles retries and consistency * Same code, different backend — total flexibility How to enable State in for a Azure Container App Open Azure Portal → go to your Container App Environment. From the left pane, click on Container Apps, and choose your desired app (e.g., orders-api). In the Settings section, select Dapr. Enable Dapr toggle → switch it ON. Provide the basic Dapr settings: App ID: A unique name (e.g., orders-app). App Port: The internal port your API listens on (e.g., 8080). App Protocol: Choose HTTP or gRPC (usually HTTP). Click Save to apply. Now, under the same Container App Environment, go to Dapr Components. Click Create → select State → choose the type of configuration (e.g., state.azure.cosmosdb). 🧩 Summary Think of Dapr as your invisible co-pilot for building distributed apps. It abstracts away all the repetitive plumbing — state management, pub/sub messaging, secret handling, and external bindings — letting you focus on writing features that matter. With Dapr, you don’t just write code that runs locally; you write code that just works across clouds, containers, and environments, without having to worry about wiring up retries, event delivery, or service-to-service communication manually. 🧰 Demo Source Code I've prepared complete sample on .Net core that touches all major Dapr features: * State Store * Pub/Sub * Bindings * Configuration * Secret Store You can explore it from Github-Dapr-Api Clone, run locally, and experiment — the project uses in-memory storage to keep things lightweight for testing and learning. 📚 References for Deep Dive Official Dapr Docs Dapr for .NET Developers — Microsoft Learn Dapr .NET SDK GitHubOn‑Device AI with Windows AI Foundry
From “waiting” to “instant”- without sending data away AI is everywhere, but speed, privacy, and reliability are critical. Users expect instant answers without compromise. On-device AI makes that possible: fast, private and available, even when the network isn’t - empowering apps to deliver seamless experiences. Imagine an intelligent assistant that works in seconds, without sending a text to the cloud. This approach brings speed and data control to the places that need it most; while still letting you tap into cloud power when it makes sense. Windows AI Foundry: A Local Home for Models Windows AI Foundry is a developer toolkit that makes it simple to run AI models directly on Windows devices. It uses ONNX Runtime under the hood and can leverage CPU, GPU (via DirectML), or NPU acceleration, without requiring you to manage those details. The principle is straightforward: Keep the model and the data on the same device. Inference becomes faster, and data stays local by default unless you explicitly choose to use the cloud. Foundry Local Foundry Local is the engine that powers this experience. Think of it as local AI runtime - fast, private, and easy to integrate into an app. Why Adopt On‑Device AI? Faster, more responsive apps: Local inference often reduces perceived latency and improves user experience. Privacy‑first by design: Keep sensitive data on the device; avoid cloud round trips unless the user opts in. Offline capability: An app can provide AI features even without a network connection. Cost control: Reduce cloud compute and data costs for common, high‑volume tasks. This approach is especially useful in regulated industries, field‑work tools, and any app where users expect quick, on‑device responses. Hybrid Pattern for Real Apps On-device AI doesn’t replace the cloud, it complements it. Here’s how: Standalone On‑Device: Quick, private actions like document summarization, local search, and offline assistants. Cloud‑Enhanced (Optional): Large-context models, up-to-date knowledge, or heavy multimodal workloads. Design an app to keep data local by default and surface cloud options transparently with user consent and clear disclosures. Windows AI Foundry supports hybrid workflows: Use Foundry Local for real-time inference. Sync with Azure AI services for model updates, telemetry, and advanced analytics. Implement fallback strategies for resource-intensive scenarios. Application Workflow Code Example 1. Only On-Device: Tries Foundry Local first, falls back to ONNX if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}") return "Error: No local AI available" 2. Hybrid approach: On-device first, cloud as last resort def get_answer(question, context): """ Priority order: 1. Foundry Local (best: advanced + private) 2. ONNX Runtime (good: fast + private) 3. Cloud API (fallback: requires internet, less private) # in case of Hybrid approach, based on real-time scenario """ if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}, trying cloud...") # Last resort: Cloud API (requires internet) if network_available(): try: import requests response = requests.post( '{BASE_URL_AI_CHAT_COMPLETION}', headers={'Authorization': f'Bearer {API_KEY}'}, json={ 'model': '{MODEL_NAME}', 'messages': [{ 'role': 'user', 'content': f'Context: {context}\n\nQuestion: {question}' }] }, timeout=10 ) answer = response.json()['choices'][0]['message']['content'] return answer, source="Cloud API (Online)" except Exception as e: return "Error: No AI runtime available", source="Failed" else: return "Error: No internet and no local AI available", source="Offline" Demo Project Output: Foundry Local answering context-based questions offline : The Foundry Local engine ran the Phi-4-mini model offline and retrieved context-based data. : The Foundry Local engine ran the Phi-4-mini model offline and mentioned that there is no answer. Practical Use Cases Privacy-First Reading Assistant: Summarize documents locally without sending text to the cloud. Healthcare Apps: Analyze medical data on-device for compliance. Financial Tools: Risk scoring without exposing sensitive financial data. IoT & Edge Devices: Real-time anomaly detection without network dependency. Conclusion On-device AI isn’t just a trend - it’s a shift toward smarter, faster, and more secure applications. With Windows AI Foundry and Foundry Local, developers can deliver experiences that respect user specific data, reduce latency, and work even when connectivity fails. By combining local inference with optional cloud enhancements, you get the best of both worlds: instant performance and scalable intelligence. Whether you’re creating document summarizers, offline assistants, or compliance-ready solutions, this approach ensures your apps stay responsive, reliable, and user-centric. References Get started with Foundry Local - Foundry Local | Microsoft Learn What is Windows AI Foundry? | Microsoft Learn https://devblogs.microsoft.com/foundry/unlock-instant-on-device-ai-with-foundry-local/Understanding Small Language Modes
Small Language Models (SLMs) bring AI from the cloud to your device. Unlike Large Language Models that require massive compute and energy, SLMs run locally, offering speed, privacy, and efficiency. They’re ideal for edge applications like mobile, robotics, and IoT.Announcing Public Preview: AI Toolkit for GitHub Copilot Prompt-First Agent Development
This week at GitHub Universe, we’re announcing the Public Preview of the GitHub Copilot prompt-first agent development in the AI Toolkit for Visual Studio Code. With this release, building powerful AI agents is now simpler and faster - no need to wrestle with complex frameworks or orchestrators. Just start with natural language prompts and let GitHub Copilot guide you from concept to working agent code. Accelerate Agent Development in VS Code The AI Toolkit embeds agent development workflows directly into Visual Studio Code and GitHub Copilot, enabling you to transform ideas into production-ready agents within minutes. This unified experience empowers developers and product teams to: Select the best model for your agent scenario Build and orchestrate agents using Microsoft Agent Framework Trace agent behaviors Evaluate agent response quality Select the best model for your scenario Models are the foundation for building powerful agents. Using the AI Toolkit, you can already explore and experiment with a wide range of local and remote models. Copilot now recommends models tailored to your agent’s needs, helping you make informed choices quickly. Build and orchestrate agents Whether you’re creating a single agent or designing a multi-agent workflow, Copilot leverages the latest Microsoft Agent Framework to generate robust agent code. You can initiate agent creation with simple prompts and visualize workflows for greater clarity and control. Create a single agent using Copilot Create a multi-agent workflow using Copilot and visualize workflow execution Trace agent behaviors As agents become more sophisticated, understanding their actions is crucial. The AI Toolkit enables tracing via Copilot, collecting local traces and displaying detailed agent calls, all within VS Code. Evaluate agent response quality Copilot guides you through structured evaluation, recommending metrics and generating test datasets. Integrate evaluations into your CI/CD pipeline for continuous quality assurance and confident deployments. Get started and share feedback This release marks a significant step toward making AI agent development easier and more accessible in Visual Studio Code. Try out the AI Toolkit for Visual Studio Code, share your thoughts, and file issues and suggest features on our GitHub repo. Thank you for being a part of this journey with us!Serverless MCP Agent with LangChain.js v1 — Burgers, Tools, and Traces 🍔
AI agents that can actually do stuff (not just chat) are the fun part nowadays, but wiring them cleanly into real APIs, keeping things observable, and shipping them to the cloud can get... messy. So we built a fresh end‑to‑end sample to show how to do it right with the brand new LangChain.js v1 and Model Context Protocol (MCP). In case you missed it, MCP is a recent open standard that makes it easy for LLM agents to consume tools and APIs, and LangChain.js, a great framework for building GenAI apps and agents, has first-class support for it. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. This new sample gives you: A LangChain.js v1 agent that streams its result, along reasoning + tool steps An MCP server exposing real tools (burger menu + ordering) from a business API A web interface with authentication, sessions history, and a debug panel (for developers) A production-ready multi-service architecture Serverless deployment on Azure in one command ( azd up ) Yes, it’s a burger ordering system. Who doesn't like burgers? Grab your favorite beverage ☕, and let’s dive in for a quick tour! TL;DR key takeaways New sample: full-stack Node.js AI agent using LangChain.js v1 + MCP tools Architecture: web app → agent API → MCP server → burger API Runs locally with a single npm start , deploys with azd up Uses streaming (NDJSON) with intermediate tool + LLM steps surfaced to the UI Ready to fork, extend, and plug into your own domain / tools What will you learn here? What this sample is about and its high-level architecture What LangChain.js v1 brings to the table for agents How to deploy and run the sample How MCP tools can expose real-world APIs Reference links for everything we use GitHub repo LangChain.js docs Model Context Protocol Azure Developer CLI MCP Inspector Use case You want an AI assistant that can take a natural language request like “Order two spicy burgers and show me my pending orders” and: Understand intent (query menu, then place order) Call the right MCP tools in sequence, calling in turn the necessary APIs Stream progress (LLM tokens + tool steps) Return a clean final answer Swap “burgers” for “inventory”, “bookings”, “support tickets”, or “IoT devices” and you’ve got a reusable pattern! Sample overview Before we play a bit with the sample, let's have a look at the main services implemented here: Service Role Tech Agent Web App ( agent-webapp ) Chat UI + streaming + session history Azure Static Web Apps, Lit web components Agent API ( agent-api ) LangChain.js v1 agent orchestration + auth + history Azure Functions, Node.js Burger MCP Server ( burger-mcp ) Exposes burger API as tools over MCP (Streamable HTTP + SSE) Azure Functions, Express, MCP SDK Burger API ( burger-api ) Business logic: burgers, toppings, orders lifecycle Azure Functions, Cosmos DB Here's a simplified view of how they interact: There are also other supporting components like databases and storage not shown here for clarity. For this quickstart we'll only interact with the Agent Web App and the Burger MCP Server, as they are the main stars of the show here. LangChain.js v1 agent features The recent release of LangChain.js v1 is a huge milestone for the JavaScript AI community! It marks a significant shift from experimental tools to a production-ready framework. The new version doubles down on what’s needed to build robust AI applications, with a strong focus on agents. This includes first-class support for streaming not just the final output, but also intermediate steps like tool calls and agent reasoning. This makes building transparent and interactive agent experiences (like the one in this sample) much more straightforward. Quickstart Requirements GitHub account Azure account (free signup, or if you're a student, get free credits here) Azure Developer CLI Deploy and run the sample We'll use GitHub Codespaces for a quick zero-install setup here, but if you prefer to run it locally, check the README. Click on the following link or open it in a new tab to launch a Codespace: Create Codespace This will open a VS Code environment in your browser with the repo already cloned and all the tools installed and ready to go. Provision and deploy to Azure Open a terminal and run these commands: # Install dependencies npm install # Login to Azure azd auth login # Provision and deploy all resources azd up Follow the prompts to select your Azure subscription and region. If you're unsure of which one to pick, choose East US 2 . The deployment will take about 15 minutes the first time, to create all the necessary resources (Functions, Static Web Apps, Cosmos DB, AI Models). If you're curious about what happens under the hood, you can take a look at the main.bicep file in the infra folder, which defines the infrastructure as code for this sample. Test the MCP server While the deployment is running, you can run the MCP server and API locally (even in Codespaces) to see how it works. Open another terminal and run: npm start This will start all services locally, including the Burger API and the MCP server, which will be available at http://localhost:3000/mcp . This may take a few seconds, wait until you see this message in the terminal: 🚀 All services ready 🚀 When these services are running without Azure resources provisioned, they will use in-memory data instead of Cosmos DB so you can experiment freely with the API and MCP server, though the agent won't be functional as it requires a LLM resource. MCP tools The MCP server exposes the following tools, which the agent can use to interact with the burger ordering system: Tool Name Description get_burgers Get a list of all burgers in the menu get_burger_by_id Get a specific burger by its ID get_toppings Get a list of all toppings in the menu get_topping_by_id Get a specific topping by its ID get_topping_categories Get a list of all topping categories get_orders Get a list of all orders in the system get_order_by_id Get a specific order by its ID place_order Place a new order with burgers (requires userId , optional nickname ) delete_order_by_id Cancel an order if it has not yet been started (status must be pending , requires userId ) You can test these tools using the MCP Inspector. Open another terminal and run: npx -y @modelcontextprotocol/inspector Then open the URL printed in the terminal in your browser and connect using these settings: Transport: Streamable HTTP URL: http://localhost:3000/mcp Connection Type: Via Proxy (should be default) Click on Connect, then try listing the tools first, and run get_burgers tool to get the menu info. Test the Agent Web App After the deployment is completed, you can run the command npm run env to print the URLs of the deployed services. Open the Agent Web App URL in your browser (it should look like https://<your-web-app>.azurestaticapps.net ). You'll first be greeted by an authentication page, you can sign in either with your GitHub or Microsoft account and then you should be able to access the chat interface. From there, you can start asking any question or use one of the suggested prompts, for example try asking: Recommend me an extra spicy burger . As the agent processes your request, you'll see the response streaming in real-time, along with the intermediate steps and tool calls. Once the response is complete, you can also unfold the debug panel to see the full reasoning chain and the tools that were invoked: Tip: Our agent service also sends detailed tracing data using OpenTelemetry. You can explore these either in Azure Monitor for the deployed service, or locally using an OpenTelemetry collector. We'll cover this in more detail in a future post. Wrap it up Congratulations, you just finished spinning up a full-stack serverless AI agent using LangChain.js v1, MCP tools, and Azure’s serverless platform. Now it's your turn to dive in the code and extend it for your use cases! 😎 And don't forget to azd down once you're done to avoid any unwanted costs. Going further This was just a quick introduction to this sample, and you can expect more in-depth posts and tutorials soon. Since we're in the era of AI agents, we've also made sure that this sample can be explored and extended easily with code agents like GitHub Copilot. We even built a custom chat mode to help you discover and understand the codebase faster! Check out the Copilot setup guide in the repo to get started. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. If you like this sample, don't forget to star the repo ⭐️! You can also join us in the Azure AI community Discord to chat and ask any questions. Happy coding and burger ordering! 🍔LangChain v1 is now generally available!
Today LangChain v1 officially launches and marks a new era for the popular AI agent library. The new version ushers in a more streamlined, and extensible foundation for building agentic LLM applications. In this post we'll breakdown what’s new, what changed, and what “general availability” means in practice. Join Microsoft Developer Advocates, Marlene Mhangami and Yohan Lasorsa, to see live demos of the new API and find out more about what JavaScript and Python developers need to know about v1. Register for this event here. Why v1? The Motivation Behind the Redesign The number of abstractions in LangChain had grown over the years to include chains, agents, tools, wrappers, prompt helpers and more, which, while powerful, introduced complexity and fragmentation. As model APIs evolve (multimodal inputs, richer structured output, tool-calling semantics), LangChain needed a cleaner, more consistent core to ensure production ready stability. In v1: All existing chains and agent abstractions in the old LangChain are deprecated; they are replaced by a single high-level agent abstraction built on LangGraph internals. LangGraph becomes the foundational runtime for durable, stateful, orchestrated execution. LangChain now emphasizes being the “fast path to agents” that doesn’t hide but builds upon LangGraph. The internal message format has been upgraded to support standard content blocks (e.g. text, reasoning, citations, tool calls) across model providers, decoupling “content” from raw strings. Namespace cleanup: the langchain package now focuses tightly on core abstractions (agents, models, messages, tools), while legacy patterns are moved into langchain-classic (or equivalents). What’s New & Noteworthy for Developers Here are key changes developers should pay attention to: 1. create_agent becomes the default API The create_agent function is now the idiomatic way to spin up agents in v1. It replaces older constructs (e.g. create_react_agent) with a clearer, more modular API. You can also now compose middleware around model calls, tool calls, before/after hooks, error handling, etc. 2. Standard content blocks & normalized message model One of LangChain's greatest stregnth's is it's model agnosticism. Content blocks move to standardize all outputs, so developers know exactly what to expect regardless of the model they are using. Responses from models are no longer opaque strings. Instead, they carry structured `content_blocks` which classify parts of the output (e.g. “text”, “reasoning”, “citation”, “tool_call”). 3. Multimodal and richer model inputs / outputs LangChain continues to support more than just text-based interactions, but in a more comprehensive way in v1. Models can accept and return files, images, video, etc., and the message format reflects this flexibility. This upgrade prepares us well for the next generation of models with mixed modalities (vision, audio, etc.). 4. Middleware hooks Because create_agent is designed as a pluggable pipeline, developers can now inject logic before/after model calls, before tool calls and more. New middleware such as 'human in the loop' and 'summarization' middleware have been added. This is a feature of the new package that I am most excited about it! Even with the simplified agents API, this option provides more room to customize workflows! Developers can try pre-built middleware or make their own. 5. Simplified, leaner namespace Many formerly top-level modules or helper classes have been removed or relocated to langchain-classic (or similarly stamped “legacy”) to declutter the main API surface. A migration guide is available to help projects transition from v0 to v1. While v1 is now the main line, older v0 is still documented and maintained for compatibility. What “General Availability” Means (and Doesn’t) v1 is production-ready, after testing the alpha version. The stable v0 release line remains supported for those unwilling or unable to migrate immediately. Breaking changes in public APIs will be accompanied by version bumps (i.e. minor version increments) and deprecation notices. The roadmap anticipates minor versions every 2–3 months (with patch releases more frequently). Because the field of LLM applications is evolving rapidly, the team expects continued iterations in v1—even in GA mode—with users encouraged to surface feedback, file issues, and adopt the migration path. (This is in line with the philosophy stated in docs.) Developer Callouts & Suggested Steps Some things we recommend for developers to do to get started with v1: Try the new API Now! LangChain Azure AI and Azure OpenAI have migrated to LangChain v1 and are ready to test! Learn more about using LangChain and Azure AI: Python: https://docs.langchain.com/oss/python/integrations/providers/azure_ai JavaScript: https://docs.langchain.com/oss/javascript/integrations/providers/microsoft Join us for a Live Stream on Wednesday 22 October 2025 Join Microsoft Developer Advocates Marlene Mhangami and Yohan Lasorsa for a livestream this Wednesday to see live demos and find out more about what JavaScript and Python developers need to know about v1. Register for this event here.Using Keycloak with Azure AD to integrate AKS Cluster authentication process
Integrating Azure Kubernetes Service (AKS) with Keycloak through Azure Active Directory (Azure AD) as an intermediary leverages Azure AD’s support for OpenID Connect (OIDC) to handle authentication and authorization. This integration enhances security, streamlines user management, and simplifies the authentication process for users accessing the AKS cluster.Postgres as a Distributed Cache Unlocks Speed and Simplicity for Modern .NET Workloads
In the world of high-performance, modern software engineering, developers often face a tough tradeoff: how to achieve lightning-fast data retrieval response rates without adding complexity, sacrificing reliability, or getting locked into specialized, external data caching products or platforms. What if you could harness the power and flexibility of your existing Postgres database to solve this challenge? Enter the Microsoft.Extensions.Caching.Postgres library, a new nuget.org package that brings distributed caching to Postgres, unlocking speed, simplicity, and seamless integration for modern .NET workloads. In this article, we’re going to take a closer look at the Postgres caching store, which introduces a new option for .NET developers planning on implementing a distributed cache, such as HybridCache, paired together with a Postgres database to provide distributed backplane operations. One data platform for multiple workloads Postgres’ reputation for reliability, extensibility, and standards compliance has long been respected, with Postgres databases driving some of today’s largest and most popular platforms. Increasingly developers, data engineers, and entrepreneurs alike all rallying to apply these benefits. One of the most compelling aspects of Postgres is its adaptability: it’s a data platform that can simultaneously handle everything from transactional workloads to analytical queries, JSON documents to geospatial data, and even time-series and vectorized AI search. In an era of specialized services, Postgres is proving that one platform can do it all and do it well. Intrepid engineers have also discovered that Postgres is often just as proficient in handling workloads traditionally supported by other very different technology solutions, such as lake-house, pub-sub, message queues, job schedulers, and session store caches. These roles are all now being powered by Postgres databases, while Postgres simultaneously continues to deliver the same scalable, battle-tested, and mission-critical ACID-compliant core relational database operations we’ve all come to expect. When speed matters most Database-backed cache stores are by no means a new concept; the first version of a database cache library for .NET was made available to developers exploring the nuget.org ecosystem (Microsoft.Extensions.Caching.SqlServer) in June 2016. This library included several impressive features, such as expiration policies, serialization, and dependency injection, making it ideal for multi-instance applications requiring shared cache functionality. It was especially useful in environments where Redis or other cache providers were not available. The convenience of leveraging a transactional database’s usefulness to function as a distributed cache comes with some tradeoffs, especially when compared against services such as Redis or Memcached; in a word: speed. All the features which make your database data durable, reliable, and consistent require precious additional clock cycles and I/O operations, and this “overhead” resulted in performance costs when compared to the alternative memory stores and caching system options. What if it was possible to maintain all those familiar and convenient interfaces for connecting to your database, while simultaneously being able to precisely configure specific tables to throw off the burden of crash consistency and replication logging? What if, for only the tables we selected, we could trade this durability for pure speed? Enter Postgres’ UNLOGGED Tables. Postgres' adaptable performance Another compelling aspect of Postgres databases is the ability to significantly speed up write-performance by bypassing the Write Ahead Log (WAL). The WAL is designed to ensure that data is crash-consistent (and replicable), and writing to your database is comprised of a transparent two-step process: your data is written to your database tables, and these changes are also committed to a separate file to guarantee the data’s persistence. It also happens that in some circumstances, the tradeoff to increase performance can be worth the sacrifice to crash-consistency, especially for short-lived, temporary types of data, like when used as a cache store. This table configuration is scoped to individual tables, which allows for combinations of “logged” and “unlogged” table configurations, both operating side-by-side within the same database instance. The net result: Postgres can provide incredibly performant response times when used as a distributed cache, rivaling the performance of other popular cache stores, while also providing the simplicity, familiarity, and consistency that the Postgres engine naturally offers. HybridCache for your .NET solutions It was this capability*combined with the inspiration from the SQL Server library that inspired the creation of the nuget.org Microsoft.Extensions.Caching.Postgres package. As a longtime .NET developer, I have personally witnessed the incredible evolution of the .NET platform and the amazing growth, enhancements, and improvements to the languages, the tooling, runtimes, and the incredible people behind each of these contributions. The recent addition of HybridCache is especially exciting to consider incorporating into your .NET solutions because it dramatically simplifies the steps required to add caching into your project, while simultaneously linking in-memory cache with a second-level tiered cache service. This seamless integration provides your application with the best of both worlds: blazing fast in-memory retrieval paired with a resilient fail-safe and similarly performant backplane in the event an application instance blinks, scales up/out, etc. Don’t just take my word for it, let’s look at some of the benchmarks between a Redis cache and Postgres database. The tests are comprised of synchronous and async operations across three different sized payloads (128, 1024, and 10240 bytes) for read/write, containing both single and concurrent messages, and at fixed and random positions. The tests are further divided into two types of cache expiration windows: absolute/non-sliding and sliding/relative windows. Consider the output from a suite of benchmarks tests, keeping in mind these results are based on microseconds, meaning 1,000 microseconds equals 1 millisecond: What do these results reveal? In certain respects, there aren’t that many surprises. Bespoke memory-based key-value cache systems like Redis continue to outperform relational databases in terms of pure speed and low latency. What is really exciting to see is that Postgres comes very close to Redis performance for more intensive operations! Finding the right fit for your solution I’m excited to make this Postgres package available to everyone considering distributed caching in their solution designs. The combination of HybridCache paired with your choice of backplane will allow you to select the right technologies and tools that are best suited for your solution. Our GitHub repo also contains a variety of sample applications, which demonstrate how to configure and use HybridCache together with the Postgres distributed cache library within a Console app service, as well as an Aspire-based sample Web API. I encourage you to explore these examples and share your thoughts and ideas. I look forward to any feedback you may have to share about the Microsoft.Extensions.Caching.Postgres package. Keep advancing, keep improving, and keep contributing to be a “builder” and an active part of our incredible community! * This package extension is highly configurable, and you can choose whether to enable/disable bypassing the WAL for your cache table, along with several other options that can be adjusted for your particular use case.AMA: Azure AI Foundry Voice Live API: Build Smarter, Faster Voice Agents
Join us LIVE in the Azure AI Foundry Discord on the 14th October, 2025, 10am PT to learn more about Voice Live API Voice is no longer a novelty, it's the next-gen interface between humans and machines. From automotive assistants to educational tutors, voice-driven agents are reshaping how we interact with technology. But building seamless, real-time voice experiences has often meant stitching together a patchwork of services: STT, GenAI, TTS, avatars, and more. Until now. Introducing Azure AI Foundry Voice Live API Launched into general availability on October 1, 2025, the Azure AI Foundry Voice Live API is a game-changer for developers building voice-enabled agents. It unifies the entire voice stack—speech-to-text, generative AI, text-to-speech, avatars, and conversational enhancements, into a single, streamlined interface. That means: ⚡ Lower latency 🧠 Smarter interactions 🛠️ Simplified development 📈 Scalable deployment Whether you're prototyping a voice bot for customer support or deploying a full-stack assistant in production, Voice Live API accelerates your journey from idea to impact. Ask Me Anything: Deep Dive with the CoreAI Speech Team Join us for a live AMA session where you can engage directly with the engineers behind the API: 🗓️ Date: 14th Oct 2025 🕒 Time: 10am PT 📍 Location: https://aka.ms/foundry/discord See the EVENTS 🎤 Speakers: Qinying Liao, Principal Program Manager, CoreAI Speech Jan Gorgen, Senior Program Manager, CoreAI Speech They’ll walk through real-world use cases, demo the API in action, and answer your toughest questions, from latency optimization to avatar integration. Who Should Attend? This AMA is designed for: AI engineers building multimodal agents Developers integrating voice into enterprise workflows Researchers exploring conversational UX Foundry users looking to scale voice prototypes Why It Matters Voice Live API isn’t just another endpoint, it’s a foundation for building natural, responsive, and production-ready voice agents. With Azure AI Foundry’s orchestration and deployment tools, you can: Skip the glue code Focus on experience design Deploy with confidence across platforms Bring Your Questions Curious about latency benchmarks? Want to know how avatars sync with TTS? Wondering how to integrate with your existing Foundry workflows? This is your chance to ask the team directly.From Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, we’ve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint it’s now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isn’t just about running models locally, it’s about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that don’t break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What You’ll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration 📖 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview • Industry Applications • SLM Introduction • Learning Objectives Beginner 1-2 hrs 📚 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals • Real World Case Studies • Implementation Guide • Edge Deployment Beginner 3-4 hrs 🧠 02 SLM Model Foundations Model families & architecture Phi Family • Qwen Family • Gemma Family • BitNET • μModel • Phi-Silica Beginner 4-5 hrs 🚀 03 SLM Deployment Practice Local & cloud deployment Advanced Learning • Local Environment • Cloud Deployment Intermediate 4-5 hrs ⚙️ 04 Model Optimization Toolkit Cross-platform optimization Introduction • Llama.cpp • Microsoft Olive • OpenVINO • Apple MLX • Workflow Synthesis Intermediate 5-6 hrs 🔧 05 SLMOps Production Production operations SLMOps Introduction • Model Distillation • Fine-tuning • Production Deployment Advanced 5-6 hrs 🤖 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction • Function Calling • Model Context Protocol Advanced 4-5 hrs 💻 07 Platform Implementation Cross-platform samples AI Toolkit • Foundry Local • Windows Development Advanced 3-4 hrs 🏭 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - 🔧 Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - 🧩 ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - 🎮 DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - 🖥️ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isn’t just about inference, it’s about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.