azure databricks
5 TopicsSecuring A Multi-Agent AI Solution Focused on User Context & the Complexities of On-Behalf-Of.
How we built an enterprise-grade multi-agent system that preserves user identity across AI agents and Databricks Introduction When building AI-powered applications for the enterprise, a common challenge emerges: how do you maintain user identity and access controls when an AI agent queries backend services on behalf of a user? In many implementations, AI agents authenticate to backend systems using a shared service account or with PAT (Personal Access Token) tokens, effectively bypassing row-level security (RLS), column masking, and other data governance policies that organizations carefully configure. This creates a security gap where users can potentially access data they shouldn’t see, simply by asking an AI agent. In this post, I’ll walk through how we solved this challenge for a current enterprise customer by implementing Microsoft Entra ID On-Behalf-Of (OBO) secure flow in a custom multi-agent LangGraph solution, enabling our Databricks Genie agent to query data and the data agent designed to modify or update delta tables, to do so as the authenticated user, while preserving all RBAC policies. The Architecture Our system is built on several key components: Chainlit: Python-based web interface for LLM-driven conversational applications, integrated with OAuth 2.0–based authentication. Customizing the framework to satisfy customer UI requirements eliminated the need to develop and maintain a bespoke React front end. It fulfilled the majority of requirements while reducing maintenance overhead. Azure App Service - Managed hosting with built-in authentication support and autoscaling LangGraph: Opensource Multi-agent orchestration framework. Azure Databricks Genie: Natural language to SQL agent. Azure Cosmos DB: Long-term memory and checkpoint storage. Microsoft Entra ID: Identity provider with OBO support. This shows: Genie: Read-only natural language queries, per-user OBO Task Agent: Handles sensitive operations (SQL modifications, etc.) with HITL approval + OBO Memory: Shared agent, no per-user auth needed The Problem with Chainlit OAuth Provider Chainlit was integrated with Microsoft Entra ID for OAuth authentication; however, the default implementation assumes Microsoft Graph scopes, requiring extension to support custom resource scopes. This means: The access token you receive is scoped for Microsoft Graph API You can’t use it for OBO flow to downstream services like Databricks The token’s audience is graph.microsoft.com, not your application For OBO to work, you need an access token where: The audience is your application’s client ID The scope includes your custom API permission (e.g., api://{client_id}/access_as_user) Solution: Custom Entra ID OBO Provider We created a custom OAuth provider that replaces Chainlit’s built-in one. Key insight: By requesting api://{client_id}/access_as_user as the scope, the returned access token has the correct audience for OBO exchange. Since we can’t call Graph API with this token (wrong audience), we extract user information from the ID token claims instead. The OBO Token Exchange Once we have the user’s access token (with correct audience), we exchange it for a Databricks-scoped token using MSAL. The resulting token: Has audience = Databricks resource ID Contains the user’s identity (UPN, OID) Can be used with Databricks SDK/API Respects all Unity Catalog permissions configured for that user Per-User Agent Creation A critical design decision: never cache user-specific agents globally. Each user needs their own Genie agent instance. Using the OBO Token with Databricks Genie The key integration point is passing the OBO-acquired token to the Databricks SDK’s WorkspaceClient as indicated in the above screenshot, which the Genie agent uses internally for all API calls as shown in the following image. Initialize Genie Agent with User’s Access Token: Wire It Into LangGraph: The user_access_token flows from Chainlit’s OAuth callback → session config → LangGraph config → agent creation, ensuring every Genie query runs with the authenticated user’s permissions. Human-in-the-Loop for Destructive SQL Operations While Databricks Genie handles natural language queries (read-only), our system also supports custom SQL execution for data modifications. Since these operations can DELETE or UPDATE data, we implement human-in-the-loop approval using LangGraph’s interrupt feature. The OBO token ensures that even when executing user-authored SQL, the query runs with the user’s permissions: they can only modify data they’re authorized to change. The destructive operation detector uses LLM-based intent analysis Entra ID App Registration Requirements Your Entra ID app registration needs: API Permissions: Azure Databricks → user_impersonation (admin consent required) Expose an API: Scope access_as_user on URI api://{client-id} Redirect URI: {your-app-url}/auth/oauth/azure-ad/callback Lessons Learned Token audience matters: OBO fails if your initial token has the wrong audience Don’t cache user-specific clients: breaks user isolation ID tokens contain user info: use claims when you can’t call Graph API HITL for destructive ops: even with RBAC, require explicit user confirmation Conclusion By implementing Entra ID OBO flow in our multi-agent system, we achieved: User identity preservation across AI agents RBAC enforcement at the Databricks/Unity Catalog level Audit trail showing actual user making queries Zero-trust architecture: the AI agent never has more access than the user Human-in-the-loop for destructive SQL operations This approach enables any organization building AI systems that supports OAuth 2.0 to participate in an on‑behalf‑of (OBO) flow. More importantly, it establishes a critical layer of AI governance for enterprise‑grade, custom multi‑agent solutions, aligning with Microsoft’s Secure Future Initiative (SFI) and Zero Trust principles. As organizations accelerate toward multi‑agent AI architectures and broader AI transformation, centralized services that standardize identity, authorization, and user delegation become foundational. Capabilities such as Microsoft Entra Agent ID and Azure AI Foundry are emerging precisely to address this need - enabling secure, scalable, and user‑context–aware agent interactions. In the next post, I’ll shift the lens from architecture to outcomes - examining what this foundation means from a CXO perspective, and why identity‑first AI governance is quickly becoming a board‑level concern.599Views1like0CommentsUnlocking Advanced Data Analytics & AI with Azure NetApp Files object REST API
Azure NetApp Files object REST API enables object access to enterprise file data stored on Azure NetApp Files, without copying, moving, or restructuring that data. This capability allows analytics and AI platforms that expect object storage to work directly against existing NFS based datasets, while preserving Azure NetApp Files’ performance, security, and governance characteristics.431Views0likes0CommentsHow Azure NetApp Files Object REST API powers Azure and ISV Data and AI services – on YOUR data
This article introduces the Azure NetApp Files Object REST API, a transformative solution for enterprises seeking seamless, real-time integration between their data and Azure's advanced analytics and AI services. By enabling direct, secure access to enterprise data—without costly transfers or duplication—the Object REST API accelerates innovation, streamlines workflows, and enhances operational efficiency. With S3-compatible object storage support, it empowers organizations to make faster, data-driven decisions while maintaining compliance and data security. Discover how this new capability unlocks business potential and drives a new era of productivity in the cloud.1KViews0likes0CommentsHow Great Engineers Make Architectural Decisions — ADRs, Trade-offs, and an ATAM-Lite Checklist
Why Decision-Making Matters Without a shared framework, context fades and teams' re-debate old choices. ADRs solve that by recording the why behind design decisions — what problem we solved, what options we considered, and what trade-offs we accepted. A good ADR: Lives next to the code in your repo. Explains reasoning in plain language. Survives personnel changes and version history. Think of it as your team’s engineering memory. The Five Pillars of Trade-offs At Microsoft, we frame every major design discussion using the Azure Well-Architected pillars: Reliability – Will the system recover gracefully from failures? Performance Efficiency – Can it meet latency and throughput targets? Cost Optimization – Are we using resources efficiently? Security – Are we minimizing blast radius and exposure? Operational Excellence – Can we deploy, monitor, and fix quickly? No decision optimizes all five. Great engineers make conscious trade-offs — and document them. A Practical Decision Flow Step What to Do Output 1. Frame It Clarify the problem, constraints, and quality goals (SLOs, cost caps). Problem statement 2. List Options Identify 2-4 realistic approaches. Options list 3. Score Trade-offs Use a Decision Matrix to rate options (1–5) against pillars. Table of scores 4. ATAM-Lite Review List scenarios, identify sensitivity points (small changes with big impact) and risks. Risk notes 5. Record It as an ADR Capture everything in one markdown doc beside the code. ADR file Example: Adding a Read-Through Cache Decision: Add a Redis cache in front of Cosmos DB to reduce read latency. Context: Average P95 latency from DB is 80 ms; target is < 15 ms. Options: A) Query DB directly B) Add read-through cache using Redis Trade-offs Performance: + Massive improvement in read speed. Cost: + Fewer RU/s on Cosmos DB. Reliability: − Risk of stale data if cache invalidation fails. Operational: + Added complexity for monitoring and TTLs. Templates You Can Re-use ADR Template # ADR-001: Add Read-through Cache in Front of Cosmos DB Status: Accepted Date: 2025-10-21 Context: High read latency; P95 = 80ms, target <15ms Options: A) Direct DB reads B) Redis cache for hot keys ✅ Decision: Adopt Redis cache for performance and cost optimization. Consequences: - Improved read latency and reduced RU/s cost - Risk of data staleness during cache invalidation - Added operational complexity Links: PR#3421, Design Doc #204, Azure Monitor dashboard Decision Matrix Example Pillar Weight Option A Option B Notes Reliability 5 3 4 Redis clustering handles failover Performance 4 2 5 In-memory reads Cost 3 4 5 Reduced RU/s Security 4 4 4 Same auth posture Operational Excellence 3 4 3 More moving parts Weighted total = Σ(weight × score) → best overall score wins. Team Guidelines Create a /docs/adr folder in each repo. One ADR per significant change; supersede old ones instead of editing history. Link ADRs in design reviews and PRs. Revisit when constraints change (incidents, new SLOs, cost shifts). Publish insights as follow-up blogs to grow shared knowledge. Why It Works This practice connects the theory of trade-offs with Microsoft’s engineering culture of reliability and transparency. It improves onboarding, enables faster design reviews, and builds a traceable record of engineering evolution. Join the Conversation Have you tried ADRs or other decision frameworks in your projects? Share your experience in the comments or link to your own public templates — let’s make architectural reasoning part of our shared language.824Views1like0Comments