apps & devops

62 Topics

Centralizing Enterprise API Access for Agent-Based Architectures
Problem Statement When building AI agents or automation solutions, calling enterprise APIs directly often means configuring individual HTTP actions within each agent for every API. While this works for simple scenarios, it quickly becomes repetitive and difficult to manage as complexity grows. The challenge becomes more pronounced when a single business domain exposes multiple APIs, or when the same APIs are consumed by multiple agents. This leads to duplicated configurations, higher maintenance effort, inconsistent behavior, and increased governance and security risks. A more scalable approach is to centralize and reuse API access. By grouping APIs by business domain using an API management layer, shaping those APIs through a Model Context Protocol (MCP) server, and exposing the MCP server as a standardized tool or connector, agents can consume business capabilities in a consistent, reusable, and governable manner. This pattern not only reduces duplication and configuration overhead but also enables stronger versioning, security controls, observability, and domain‑driven ownership—making agent-based systems easier to scale and operate in enterprise environments. Designing Agent‑Ready APIs with Azure API Management, an MCP Server, and Copilot Studio As enterprises increasingly adopt AI‑powered assistants and Copilots, API design must evolve to meet the needs of intelligent agents. Traditional APIs—often designed for user interfaces or backend integrations—can expose excessive data, lack intent-level abstraction, and increase security risk when consumed directly by AI systems. This document outlines a practical, enterprise-‑ready approach to organize APIs in Azure API Management (APIM), introduce a Model Context Protocol (MCP) server to shape and control context, and integrate the solution with Microsoft Copilot Studio. The goal is to make APIs truly agent-‑ready: secure, scalable, reusable, and easy to govern. Architecture at a glance Back-end services expose domain APIs. Azure API Management (APIM) groups and governs those APIs (products, policies, authentication, throttling, versions). An MCP server calls APIM, orchestrates/filters responses, and returns concise, model-friendly outputs. Copilot Studio connects to the MCP server and invokes a small set of predictable operations to satisfy user intents. Why Traditional API Designs Fall Short for AI Agents Enterprise APIs have historically been built around CRUD operations and service-‑to-‑service integration patterns. While this works well for deterministic applications, AI agents work best with intent-driven operations and context-aware responses. When agents consume traditional APIs directly, common issues include: overly verbose payloads, multiple calls to satisfy a single user intent, and insufficient guardrails for read vs. write operations. The result can be unpredictable agent behavior that is difficult to test, validate, and govern. Structuring APIs Effectively in Azure API Management Azure API Management (APIM) is the control plane between enterprise systems and AI agents. A well-‑structured APIM instance improves security, discoverability, and governance through products, policies, subscriptions, and analytics. Key design principles for agent consumption Organize APIs by business capability (for example, Customer, Orders, Billing) rather than technical layers. Expose agent-facing APIs via dedicated APIM products to enable controlled access, throttling, versioning, and independent lifecycle management. Prefer read-only operations where possible; scope write operations narrowly and protect them with explicit checks, approvals, and least-privilege identities. Read‑only APIs should be prioritized, while action‑oriented APIs must be carefully scoped and gated. The Role of the MCP Server in Agent‑Based Architectures APIM provides governance and security, but agents also need an intent-level interface and model-friendly responses. A Model Context Protocol (MCP) server fills this gap by acting as a mediator between Copilot Studio and APIM-exposed APIs. Instead of exposing many back-end endpoints directly to the agent, the MCP server can: orchestrate multiple API calls, filter irrelevant fields, enforce business rules, enrich results with additional context, and emit concise, predictable JSON outputs. This makes agent behavior more reliable and easier to validate. Instead of exposing multiple backend APIs directly to the agent, the MCP server aggregates responses, filters irrelevant data, enriches results with business context, and formats responses into LLM‑friendly schemas. By introducing this abstraction layer, Copilot interactions become simpler, safer, and more deterministic. The agent interacts with a small number of well‑defined MCP operations that encapsulate enterprise logic without exposing internal complexity. Designing an Effective MCP Server An MCP server should have a focused responsibility: shaping context for AI models. It should not replace core back-end services; it should adapt enterprise capabilities for agent consumption. What MCP should do An MCP server should be designed with a clear and focused responsibility: shaping context for AI models. Its primary role is not to replace backend services, but to adapt enterprise data for intelligent consumption. MCP does not orchestrate enterprise workflows or apply business logic. It standardizes how agents discover and invoke external tools and APIs by exposing them through a structured protocol interface. Orchestration, intent resolution, and policy-driven execution are handled by the agent runtime or host framework. It is equally important to understand what does not belong in MCP. Complex transactional workflows, long‑running processes, and UI‑specific formatting should remain in backend systems. Keeping MCP lightweight ensures scalability and easier maintenance. Call APIM-managed APIs and orchestrate multi-step retrieval when needed. Apply security checks and business rules consistently. Filter and minimize payloads (return only fields needed for the intent). Normalize and reshape responses into stable, predictable JSON schemas. Handle errors and edge cases with safe, descriptive messages. What MCP should not do Avoid implementing complex transactional workflows, long-running processes, or UI-specific formatting in MCP. Keep it lightweight so it remains scalable, testable, and easy to maintain. Step by step guide 1) Create an MCP server in Azure API Management (APIM) Open the Azure portal (portal.azure.com). Go to your API Management instance. In the left navigation, expand APIs. Create (or select) an API group for the business domain you want to expose (for example, Orders or Customers). Add the relevant APIs/operations to that API group. Create or select an APIM product dedicated for agent usage, and ensure the product requires a subscription (subscription key). Create an MCP server in APIM and map it to the API (or API group) you want to expose as MCP operations. In the MCP server settings, ensure Subscription key required is enabled. From the product’s Subscriptions page, copy the subscription key you will use in Copilot Studio. Screenshot placeholders: APIM API group, product configuration, MCP server mapping, subscription settings, subscription key location. * Note: Using an API Management subscription key to access MCP operations is one supported way to authenticate and consume enterprise APIs. However, this approach is best suited for initial setups, demos, or scenarios where key-based access is explicitly required. For production‑grade enterprise solutions, Microsoft recommends using managed identity–based access control. Managed identities for Azure resources eliminate the need to manage secrets such as subscription keys or client secrets, integrate natively with Microsoft Entra ID, and support fine‑grained role‑based access control (RBAC). This approach improves security posture while significantly reducing operational and governance overhead for agent and service‑to‑service integrations. Wherever possible, agents and MCP servers should authenticate using managed identities to ensure secure, scalable, and compliant access to enterprise APIs. 2) Create a Copilot Studio agent and connect to the APIM MCP server using a subscription key Copilot Studio natively supports Model Context Protocol (MCP) servers as tools. When an agent is connected to an MCP server, the tool metadata—including operation names, inputs, and outputs—is automatically discovered and kept in sync, reducing manual configuration and maintenance overhead. Sign in to Copilot Studio. Create a new agent and add clear instructions describing when to use the MCP tool and how to present results (for example, concise summaries plus key fields). Open Tools > Add tool > Model Context Protocol, then choose Create. Enter the MCP server details: Server endpoint URL: copy this from your MCP server in APIM. Authentication: select API Key. Header name: use the subscription key header required by your APIM configuration. Select Create new connection, paste the APIM subscription key, and save. Test the tool in the agent by prompting for a domain-specific task (for example, “Get order status for 12345”). Validate that responses are concise and that errors are handled safely. Screenshot placeholders: MCP tool creation screen, endpoint + auth configuration, connection creation, test prompt and response. Operational best practices and guardrails Least privilege by default: create separate APIM products and identities for agent scenarios; avoid broad access to internal APIs. Prefer intent-level operations: expose fewer, higher-level MCP operations instead of many low-level endpoints. Protect write operations: require explicit parameters, validation, and (when appropriate) approval flows; keep “read” and “write” tools separate. Stable schemas: return predictable JSON shapes and limit optional fields to reduce prompt brittleness. Observability: log MCP requests/responses (with sensitive fields redacted), monitor APIM analytics, and set alerts for failures and throttling. Versioning: version MCP operations and APIM APIs; deprecate safely. Security hygiene: treat subscription keys as secrets, rotate regularly, and avoid exposing them in prompts or logs. Summary As organizations scale agent‑based and Copilot‑driven solutions, directly exposing enterprise APIs to AI agents quickly becomes complex and risky. Centralizing API access through Azure API Management, shaping agent‑ready context via a Model Context Protocol (MCP) server, and consuming those capabilities through Copilot Studio establishes a clean and governable architecture. This pattern reduces duplication, enforces consistent security controls, and enables intent‑driven API consumption without exposing unnecessary backend complexity. By combining domain‑aligned API products, lightweight MCP operations, and least‑privilege identity‑based access, enterprises can confidently scale AI agents while maintaining strong governance, observability, and operational control. References Azure API Management (APIM) – Overview Azure API Management – Key Concepts Azure MCP Server Documentation (Model Context Protocol) Extend your agent with Model Context Protocol Managed identities for Azure resources – Overview
sbaskaran
Apr 23, 2026 Place Azure Architecture Blog
316Views
0likes
0Comments
Advancing to Agentic AI with Azure NetApp Files VS Code Extension v1.2.0
The Azure NetApp Files VS Code Extension v1.2.0 introduces a major leap toward agentic, AI‑informed cloud operations with the debut of the autonomous Volume Scanner. Moving beyond traditional assistive AI, this release enables intelligent infrastructure analysis that can detect configuration risks, recommend remediations, and execute approved changes under user governance. Complemented by an expanded natural language interface, developers can now manage, optimize, and troubleshoot Azure NetApp Files resources through conversational commands - from performance monitoring to cross‑region replication, backup orchestration, and ARM template generation. Version 1.2.0 establishes the foundation for a multi‑agent system built to reduce operational toil and accelerate a shift toward self-managing enterprise storage in the cloud.
GeertVanTeylingen
Apr 09, 2026 Place Azure Architecture Blog
314Views
0likes
0Comments
Designing Reliable Health Check Endpoints for IIS Behind Azure Application Gateway
Why Health Probes Matter in Azure Application Gateway Azure Application Gateway relies entirely on health probes to determine whether backend instances should receive traffic. If a probe: Receives a non‑200 response Times out Gets redirected Requires authentication …the backend is marked Unhealthy, and traffic is stopped—resulting in user-facing errors. A healthy IIS application does not automatically mean a healthy Application Gateway backend. Failure Flow: How a Misconfigured Health Probe Leads to 502 Errors One of the most confusing scenarios teams encounter is when the IIS application is running correctly, yet users intermittently receive 502 Bad Gateway errors. This typically happens when health probes fail, causing Azure Application Gateway to mark backend instances as Unhealthy and stop routing traffic to them. The following diagram illustrates this failure flow. Failure Flow Diagram (Probe Fails → Backend Unhealthy → 502) Key takeaway: Most 502 errors behind Azure Application Gateway are not application failures—they are health probe failures. What’s Happening Here? Azure Application Gateway periodically sends health probes to backend IIS instances. If the probe endpoint: o Redirects to /login o Requires authentication o Returns 401 / 403 / 302 o Times out the probe is considered failed. After consecutive failures, the backend instance is marked Unhealthy. Application Gateway stops forwarding traffic to unhealthy backends. If all backend instances are unhealthy, every client request results in a 502 Bad Gateway—even though IIS itself may still be running. This is why a dedicated, lightweight, unauthenticated health endpoint is critical for production stability. Common Health Probe Pitfalls with IIS Before designing a solution, let’s look at what commonly goes wrong. 1. Probing the Root Path (/) Many IIS applications: Redirect / → /login Require authentication Return 401 / 302 / 403 Application Gateway expects a clean 200 OK, not redirects or auth challenges. 2. Authentication-Enabled Endpoints Health probes do not support authentication headers. If your app enforces: Windows Authentication OAuth / JWT Client certificates …the probe will fail. 3. Slow or Heavy Endpoints Probing a controller that: Calls a database Performs startup checks Loads configuration can cause intermittent failures, especially under load. 4. Certificate and Host Header Mismatch TLS-enabled backends may fail probes due to: Missing Host header Incorrect SNI configuration Certificate CN mismatch Design Principles for a Reliable IIS Health Endpoint A good health check endpoint should be: Lightweight Anonymous Fast (< 100 ms) Always return HTTP 200 Independent of business logic Client Browser | | HTTPS (Public DNS) v +-------------------------------------------------+ | Azure Application Gateway (v2) | | - HTTPS Listener | | - SSL Certificate | | - Custom Health Probe (/health) | +-------------------------------------------------+ | | HTTPS (SNI + Host Header) v +-------------------------------------------------------------------+ | IIS Backend VM | | | | Site Bindings: | | - HTTPS : app.domain.com | | | | Endpoints: | | - /health (Anonymous, Static, 200 OK) | | - /login (Authenticated) | | | +-------------------------------------------------------------------+ Azure Application Gateway health probe architecture for IIS backends using a dedicated /health endpoint. Azure Application Gateway continuously probes a dedicated /health endpoint on each IIS backend instance. The health endpoint is designed to return a fast, unauthenticated 200 OK response, allowing Application Gateway to reliably determine backend health while keeping application endpoints secure. Step 1: Create a Dedicated Health Endpoint Recommended Path 1 /health This endpoint should: Bypass authentication Avoid redirects Avoid database calls Example: Simple IIS Health Page Create a static file: 1 C:\inetpub\wwwroot\website\health\index.html Static Fast Zero dependencies Step 2: Exclude the Health Endpoint from Authentication If your IIS site uses authentication, explicitly allow anonymous access to /health. web.config Example 1 <location path="health"> 2 <system.webServer> 3 <security> 4 <authentication> 5 <anonymousAuthentication enabled="true" /> 6 <windowsAuthentication enabled="false" /> 7 </authentication> 8 </security> 9 </system.webServer> 10 </location> ⚠️ This ensures probes succeed even if the rest of the site is secured. Step 3: Configure Azure Application Gateway Health Probe Recommended Probe Settings Setting Value Protocol HTTPS Path /health Interval 30 seconds Timeout 30 seconds Unhealthy threshold 3 Pick host name from backend Enabled Why “Pick host name from backend” matters This ensures: Correct Host header Proper certificate validation Avoids TLS handshake failures Step 4: Validate Health Probe Behavior From Application Gateway Navigate to Backend health Ensure status shows Healthy Confirm response code = 200 From the IIS VM 1 Invoke-WebRequest https://your-app-domain/health Expected: 1 StatusCode : 200 Troubleshooting Common Failures Probe shows Unhealthy but app works ✔ Check authentication rules ✔ Verify /health does not redirect ✔ Confirm HTTP 200 response TLS or certificate errors ✔ Ensure certificate CN matches backend domain ✔ Enable “Pick host name from backend” ✔ Validate certificate is bound in IIS Intermittent failures ✔ Reduce probe complexity ✔ Avoid DB or service calls ✔ Use static content Production Best Practices Use separate health endpoints per application Never reuse business endpoints for probes Monitor probe failures as early warning signs Test probes after every deployment Keep health endpoints simple and boring Final Thoughts A reliable health check endpoint is not optional when running IIS behind Azure Application Gateway—it is a core part of application availability. By designing a dedicated, authentication‑free, lightweight health endpoint, you can eliminate a large class of false outages and significantly improve platform stability. If you’re migrating IIS applications to Azure or troubleshooting unexplained Application Gateway failures, start with your health probe—it’s often the silent culprit.
AjaySingh_
Apr 08, 2026 Place Azure Architecture Blog
311Views
0likes
0Comments
Granting Azure Resources Access to SharePoint Online Sites Using Managed Identity
When integrating Azure resources like Logic Apps, Function Apps, or Azure VMs with SharePoint Online, you often need secure and granular access control. Rather than handling credentials manually, Managed Identity is the recommended approach to securely authenticate to Microsoft Graph and access SharePoint resources. High-level steps: Step 1: Enable Managed Identity (or App Registration) Step 2: Grant Sites.Selected Permission in Microsoft Entra ID Step 3: Assign SharePoint Site-Level Permission Step 1: Enable Managed Identity (or App Registration) For your Azure resource (e.g., Logic App): Navigate to the Azure portal. Go to the resource (e.g., Logic App). Under Identity, enable System-assigned Managed Identity. Note the Object ID and Client ID (you’ll need the Client ID later). Alternatively, use an App Registration if you prefer a multi-tenant or reusable identity. How to register an app in Microsoft Entra ID - Microsoft identity platform | Microsoft Learn Step 2: Grant Sites.Selected Permission in Microsoft Entra Open Microsoft Entra ID > App registrations. Select your Logic App’s managed identity or app registration. Under API permissions, click Add a permission > Microsoft Graph. Select Application permissions and add: Sites.Selected Click Grant admin consent. Note: Sites.Selected ensures least-privilege access — you must explicitly allow site-level access later. Step 3: Assign SharePoint Site-Level Permission SharePoint Online requires site-level consent for apps with Sites.Selected. Use the script below to assign access. Note: You must be a SharePoint Administrator and have the Sites.FullControl.All permission when running this. PowerShell Script: # Replace with your values $application = @{ id = "{ApplicationID}" # Client ID of the Managed Identity displayName = "{DisplayName}" # Display name (optional but recommended) } $appRole = "write" # Can be "read" or "write" $spoTenant = "contoso.sharepoint.com" # Sharepoint site host $spoSite = "{Sitename}" # Sharepoint site name # Site ID format for Graph API $spoSiteId = $spoTenant + ":/sites/" + $spoSite + ":" # Load Microsoft Graph module Import-Module Microsoft.Graph.Sites # Connect with appropriate permissions Connect-MgGraph -Scope Sites.FullControl.All # Grant site-level permission New-MgSitePermission -SiteId $spoSiteId -Roles $appRole -GrantedToIdentities @{ Application = $application } That's it, Your Logic App or Azure resource can now call Microsoft Graph APIs to interact with that specific SharePoint site (e.g., list files, upload documents). You maintain centralized control and least-privilege access, complying with enterprise security standards. By following this approach, you ensure secure, auditable, and scalable access from Azure services to SharePoint Online — no secrets, no user credentials, just managed identity done right.
anammalu
Mar 25, 2026 Place Azure Architecture Blog
10KViews
2likes
6Comments
Building a Secure and Compliant Azure AI Landing Zone: Policy Framework & Best Practices
As organizations accelerate their AI adoption on Microsoft Azure, governance, compliance, and security become critical pillars for success. Deploying AI workloads without a structured compliance framework can expose enterprises to data privacy issues, misconfigurations, and regulatory risks. To address this challenge, the Azure AI Landing Zone provides a scalable and secure foundation — bringing together Azure Policy, Blueprints, and Infrastructure-as-Code (IaC) to ensure every resource aligns with organizational and regulatory standards. The Azure Policy & Compliance Framework acts as the governance backbone of this landing zone. It enforces consistency across environments by applying policy definitions, initiatives, and assignments that monitor and remediate non-compliant resources automatically. This blog will guide you through: 🧭 The architecture and layers of an AI Landing Zone 🧩 How Azure Policy as Code enables automated governance ⚙️ Steps to implement and deploy policies using IaC pipelines 📈 Visualizing compliance flows for AI-specific resources What is Azure AI Landing Zone (AI ALZ)? AI ALZ is a foundational architecture that integrates core Azure services (ML, OpenAI, Cognitive Services) with best practices in identity, networking, governance, and operations. To ensure consistency, security, and responsibility, a robust policy framework is essential. Policy & Compliance in AI ALZ Azure Policy helps enforce standards across subscriptions and resource groups. You define policies (single rules), group them into initiatives (policy sets), and assign them with certain scopes & exemptions. Compliance reporting helps surface noncompliant resources for mitigation. In AI workloads, some unique considerations: Sensitive data (PII, models) Model accountability, logging, audit trails Cost & performance from heavy compute usage Preview features and frequent updates Scope This framework covers: Azure Machine Learning (AML) Azure API Management Azure AI Foundry Azure App Service Azure Cognitive Services Azure OpenAI Azure Storage Accounts Azure Databases (SQL, Cosmos DB, MySQL, PostgreSQL) Azure Key Vault Azure Kubernetes Service Core Policy Categories 1. Networking & Access Control Restrict resource deployment to approved regions (e.g., Europe only). Enforce private link and private endpoint usage for all critical resources. Disable public network access for workspaces, storage, search, and key vaults. 2. Identity & Authentication Require user-assigned managed identities for resource access. Disable local authentication; enforce Microsoft Entra ID (Azure AD) authentication. 3. Data Protection Enforce encryption at rest with customer-managed keys (CMK). Restrict public access to storage accounts and databases. 4. Monitoring & Logging Deploy diagnostic settings to Log Analytics for all key resources. Ensure activity/resource logs are enabled and retained for at least one year. 5. Resource-Specific Guardrails Apply built-in and custom policy initiatives for OpenAI, Kubernetes, App Services, Databases, etc. A detailed list of all policies is bundled and attached at the end of this blog. Be sure to check it out for a ready-to-use Excel file—perfect for customer workshops—which includes policy type (Standalone/Initiative), origin (Built-in/Custom), and more. Implementation: Policy-as-Code using EPAC To turn policies from Excel/JSON into operational governance, Enterprise Policy as Code (EPAC) is a powerful tool. EPAC transforms policy artifacts into a desired state repository and handles deployment, lifecycle, versioning, and CI/CD automation. What is EPAC & Why Use It? EPAC is a set of PowerShell scripts / modules to deploy policy definitions, initiatives, assignments, role assignments, exemptions. Enterprise Policy As Code (EPAC) It supports CI/CD integration (GitHub Actions, Azure DevOps) so policy changes can be treated like code. It handles ordering, dependency resolution, and enforcement of a “desired state” — any policy resources not in your repo may be pruned (depending on configuration). It integrates with Azure Landing Zones (including governance baseline) out of the box. References & Further Reading EPAC GitHub Repository Advanced Azure Policy management - Microsoft Learn [Advanced A...Framework] How to deploy Azure policies the DevOps way [How to dep...- Rabobank]
Madhur_Shukla
Feb 05, 2026 Place Azure Architecture Blog
2.2KViews
1like
2Comments
Cross-Region Zero Trust: Connecting Power Platform to Azure PaaS across different regions
In the modern enterprise cloud landscape, data rarely sits in one place. You might face a scenario where your Power Platform environment (Dynamics 365, Power Apps, or Power Automate) is hosted in Region A for centralized management, while your sensitive SQL Databases or Storage Accounts must reside in Region B due to data sovereignty, latency requirements, or legacy infrastructure. Connecting these two worlds usually involves traversing the public internet - a major "red flag" for security teams. The Missing Link in Cloud Security When we talk about enterprise security, "Public Access: Disabled" is the holy grail. But for Power Platform architects, this setting is often followed by a headache. The challenge is simple but daunting: How can a Power Platform Environment (e.g., in Region A) communicate with an Azure PaaS service (e.g., Storage or SQL in Region B) when that resource is completely locked down behind a Private Endpoint? Existing documentation usually covers single-region setups with no firewalls. This post details a "Zero Trust" architecture that bridges this gap. This is a walk through for setting up a Cross-Region Private Link that routes traffic from the Power Platform in Region A, through a secure Azure Hub, and down the Azure Global Backbone to a Private Endpoint in Region B, without a single packet ever touching the public internet. 1. Understanding the Foundation: VNet Support Before we build, we must understand what moves: Power Platform VNet integration is an "Outbound" technology. It allows the platform to connect to data sources secured within an Azure Virtual Network and "inject" its traffic into your Virtual Network, without needing to install or manage an on-premises data gateway. According to Microsoft's official documentation, this integration supports a wide range of services: Dataverse: Plugins and Virtual Tables. Power Automate: Cloud Flows using standard connectors. Power Apps: Canvas Apps calling private APIs. This means once the "tunnel" is built, your entire Power Platform ecosystem can reach your private Azure universe. Virtual Network support overview – Power Platform | Microsoft Learn 2. The Architecture: A Cross-Region Global Bridge Based on the Hub-and-Spoke topology, this architecture relies on four key components working in unison: Source (Region A): The Power Platform environment utilizes VNet Injection. This injects the platform's outbound traffic into a dedicated, delegated subnet within your Region A Spoke VNet. The Hub: A central VNet containing an Azure Firewall. This acts as the regional traffic cop and DNS Proxy, inspecting traffic and resolving private names before allowing packets to traverse the global backbone. The Bridge (Global Backbone): We utilize Global VNet Peering to connect Region A to the Region B Spoke. This keeps traffic on Microsoft's private fiber backbone. Destination (Region B): The Azure PaaS service (e.g. Storage Account) is locked down with Public Access Disabled. It is only accessible via a Private Endpoint. The Architecture: Visualizing the Flow As illustrated in the diagram below, this solution separates the responsibilities into two distinct layers: the Network Admin (Azure Infrastructure) and the Power Platform Admin (Enterprise Policy). 3. The High Availability Constraint: Regional Pairs A common pitfall of these deployments is configuring only a single region. Power Platform environments are inherently redundant. In a geography like Europe, your environment is actually hosted across a Regional Pair (e.g., West Europe and North Europe). Why? If one Azure region in the pair experiences an outage, your Power Platform environment will failover to the second region. If your VNet Policy isn't already there, your private connectivity will break. To maintain High Availability (HA) for your private tunnel, your Azure footprint must mirror this: Two VNets: You must create a Virtual Network in each region of the pair. Two Delegated Subnets: Each VNet requires a subnet delegated specifically to Microsoft.PowerPlatform/enterprisePolicies. Two Network Policies: You must create an Enterprise Policy in each region and link both to your environment to ensure traffic flows even during a regional failover. Ensure your Azure subscription is registered for the Microsoft.PowerPlatform resource provider by running the SetupSubscriptionForPowerPlatform.ps1 script. 4. Solving the DNS Riddle with Azure Firewall In a Hub-and-Spoke model, peering the VNets is only half the battle. If your Power Platform environment in Region A asks for mystorage.blob.core.windows.net, it will receive a public IP by default, and your connection will be blocked. To fix this, we utilize the Azure Firewall as a DNS Proxy: Link the Private DNS Zone: Ensure your Private DNS Zones (e.g., privatelink.blob.core.windows.net) are linked to the Hub VNet. Enable DNS Proxy: Turn on the DNS Proxy feature on your Azure Firewall. Configure Custom DNS: Set the DNS servers of your Spoke VNets (Region A) to the Firewall’s Internal IP. Now, the DNS query flows through the Firewall, which "sees" the Private DNS Zone and returns the Private IP to the Power Platform. 5. Secretless Security with User-Assigned Managed Identity Private networking secures the path, but identity secures the access. Instead of managing fragile Client Secrets, we use User-Assigned Managed Identity (UAMI). Phase A: The Azure Setup Create the Identity: Generate a User-Assigned Managed Identity in your Azure subscription. Assign RBAC Roles: Grant this identity specific permissions on your destination resource. For example, assign the Storage Blob Data Contributor role to allow the identity to manage files in your private storage account. Phase B: The Power Platform Integration To make the environment recognize this identity, you must register it as an Application User: Navigate to the Power Platform Admin Center. Go to Environments > [Your Environment] > Settings > Users + permissions > Application users. Add a new app and select the Managed Identity you created in Azure. 6. Creating Enterprise Policy using PowerShell Scripts One of the most important things to realize is that Enterprise Policies cannot be created manually in the Azure Portal UI. They must be deployed via PowerShell or CLI. While Microsoft provides a comprehensive official GitHub repository with all the necessary templates, it is designed to be highly modular and granular. This means that to achieve a High Availability (HA) setup, an admin usually needs to execute deployments for each region separately and then perform the linking step. To simplify this workflow, I have developed a Simplified Scripts Repository on my GitHub. These scripts use the official Microsoft templates as their foundation but add an orchestration layer specifically for the Regional Pair requirement: Regional Pair Automation: Instead of running separate deployments, my script handles the dual-VNet injection in a single flow. It automates the creation of policies in both regions and links them to your environment in one execution. Focused Scenarios: I’ve distilled the most essential scripts for Network Injection and Encryption (CMK), making it easier for admins to get up and running without navigating the entire modular library. The Goal: To provide a "Fast-Track" experience that follows Microsoft's best practices while reducing the manual steps required to achieve a resilient, multi-region architecture. Owning the Keys with Encryption Policies (CMK) While Microsoft encrypts Dataverse data by default, many enterprise compliance standards require Customer-Managed Keys (CMK). This ensures that you, not Microsoft, control the encryption keys for your environments. - Manage your customer-managed encryption key - Power Platform | Microsoft Learn Key Requirements: Key Vault Configuration: Your Key Vault must have Purge Protection and Soft Delete enabled to prevent accidental data loss. The Identity Bridge: The Encryption Policy uses the User-Assigned Managed Identity (created in Step 5) to authenticate against the Key Vault. Permissions: You must grant the Managed Identity the Key Vault Crypto Service Encryption User role so it can wrap and unwrap the encryption keys. 7. The Final Handshake: Linking Policies to Your Environment Creating the Enterprise Policy in Azure is only the first half of the process. You must now "inform" your Power Platform environment that it should use these policies for its outbound traffic and identity. Linking the Policies to Your Environment: For VNet Injection: In the Admin Center, go to Security > Data and privacy > Azure Virtual Network Policies. Select your environment and link it to the Network Injection policies you created. For Encryption (CMK): Go to Security > Data and privacy > Customer-managed encryption Key. Add the Select the Encryption Enterprise Policy -Edit Policy - Add Environment. Crucial Step: You must first grant the Power Platform service "Get", "List", "Wrap" and "Unwrap" permissions on your specific key within Azure Key Vault before the environment can successfully validate the policy. Verification: The "Smoking Gun" in Log Analytics After successfully reaching a Resource from one of the power platform services you can check if the connection was private. How do you prove its private? Use KQL in Azure Log Analytics to verify the Network Security Perimeter (NSP) ID. The Proof: When you see a GUID in the NetworkPerimeter field, it is cryptographic evidence that the resource accepted the request only because it arrived via your authorized private bridge. In Azure Portal - Navigate to your Resource for example KeyVault - Logs - Use the following KQL: AzureDiagnostics | where ResourceProvider == "MICROSOFT.KEYVAULT" | where OperationName == "KeyGet" or OperationName == "KeyUnwrap" | where ResultType == "Success" | project TimeGenerated, OperationName, VaultName = Resource, ResultType, CallerIP = CallerIPAddress, EnterprisePolicy = identity_claim_xms_mirid_s, NetworkPerimeter = identity_claim_xms_az_nwperimid_s | sort by TimeGenerated desc Result: By implementing the Network, and Encryption Enterprise policy you transition the Power Platform from a public SaaS tool into a fully governed, private extension of your Azure infrastructure. You no longer have to choose between the agility of low-code and the security of a private cloud. To summarize the transformation from public endpoints to a complete Zero Trust architecture across regions, here is the end-to-end workflow: PHASE 1: Azure Infrastructure Foundation Create Network Fabric (HA): Deploy VNets and Delegated Subnets in both regional pairs. Deploy the Hub: Set up the Central Hub VNet with Azure Firewall. Connect Globally: Establish Global VNet Peering between all Spokes and the Hub. Solve DNS: Enable DNS Proxy on the Firewall and link Private DNS Zones to the Hub VNet. ↓ PHASE 2: Identity & Security Prep Create Identity: Generate a User-Assigned Managed Identity (UAMI). Grant Access (RBAC): Give the UAMI permissions on the target PaaS resource (e.g., Storage Contributor). Prepare CMK: Configure Key Vault access policies for the UAMI (Wrap/Unwrap permissions). ↓ PHASE 3: Deploy Enterprise Policies (PowerShell/IaC) Deploy Network Policies: Create "Network Injection" policies in Azure for both regions. Deploy Encryption Policy: Create the "CMK" policy linking to your Key Vault and Identity. ↓ PHASE 4: Power Platform Final Link (Admin Center) Link Network: Associate the Environment with the two Network Policies. Link Encryption: Activate the Customer-Managed Key on the environment. Register User: Add the Managed Identity as an "Application User" in the environment. ↓ PHASE 5: Verification Run Workload: Trigger a Flow or Plugin. Audit Logs: Use KQL in Log Analytics to confirm the presence of the NetworkPerimeter ID.
Idit_Bnaya
Feb 04, 2026 Place Azure Architecture Blog
1.2KViews
3likes
2Comments
Architecting an Azure AI Hub-and-Spoke Landing Zone for Multi-Tenant Enterprises
A large enterprise customer adopting AI at scale typically needs three non‑negotiables in its AI foundation: End‑to‑end tenant isolation across network, identity, compute, and data Secure, governed traffic flow from users to AI services Transparent chargeback/showback for shared AI and platform services At the same time, the platform must enable rapid onboarding of new tenants or applications and scale cleanly from proof‑of‑concept to production. This article proposes an Azure Landing Zone–aligned architecture using a Hub‑and‑Spoke model, where: The AI Hub centralizes shared services and governance AI Spokes host tenant‑dedicated AI resources Application logic and AI agents run on AKS The result is a secure, scalable, and operationally efficient enterprise AI foundation. 1. Architecture goals & design principles Goals Host application logic and AI agents on Azure Kubernetes Service (AKS) as custom deployments instead of using agents under Azure AI Foundry Enforce strong tenant isolation across all layers Support cross chargeback and cost attribution Adopt a Hub‑and‑Spoke model with clear separation of shared vs. tenant‑specific services Design principles (Azure Landing Zone aligned) Azure Landing Zone (ALZ) guidance emphasizes: Separation of platform and workload subscriptions Management groups and policy inheritance Centralized connectivity using hub‑and‑spoke networking Policy‑driven governance and automation For infrastructure as code, ALZ‑aligned deployments typically use Bicep or Terraform, increasingly leveraging Azure Verified Modules (AVM) for consistency and long‑term maintainability. 2. Subscription & management group model A practical enterprise layout looks like this: Tenant Root Management Group o Platform Management Group Connectivity subscription (Hub VNet, Firewall, DNS, ExpressRoute/VPN) Management subscription (Log Analytics, Monitor) Security subscription (Defender for Cloud, Sentinel if required) o AI Hub Management Group AI Hub subscription (shared AI and governance services) o AI Spokes Management Group One subscription per tenant, business unit, or regulated boundary This structure supports enterprise‑scale governance while allowing teams to operate independently within well‑defined guardrails. 3. Logical architecture — AI Hub vs. AI Spoke AI Hub (central/shared services) The AI Hub acts as the governed control plane for AI consumption: Ingress & edge security: Azure Application Gateway with WAF (or Front Door for global scenarios) Central egress control: Azure Firewall with forced tunneling API governance: Azure API Management (private/internal mode) Shared AI services: Azure OpenAI (shared deployments where appropriate), safety controls Monitoring & observability: Azure Monitor, Log Analytics, centralized dashboards Governance: Azure Policy, RBAC, naming and tagging standards All tenant traffic enters through the hub, ensuring consistent enforcement of security, identity, and usage policies. AI Spoke (tenant‑dedicated services) Each AI Spoke provides a tenant‑isolated data and execution plane: Tenant‑dedicated storage accounts and databases Vector stores and retrieval systems (Azure AI Search with isolated indexes or services) AKS runtime for tenant‑specific AI agents and backend services Tenant‑scoped keys, secrets, and identities 4. Logical architecture diagram (Hub vs. Spoke) 5. Network architecture — Hub and Spoke 6. Tenant onboarding & isolation strategy Tenant onboarding flow Tenant onboarding is automated using a landing zone vending model: Request new tenant or application Provision a spoke subscription and baseline policies Deploy spoke VNet and peer to hub Configure private DNS and firewall routes Deploy AKS tenancy and data services Register identities and API subscriptions Enable monitoring and cost attribution This approach enables consistent, repeatable onboarding with minimal manual effort. Isolation by design Network: Dedicated VNets, private endpoints, no public AI endpoints Identity: Microsoft Entra ID with tenant‑aware claims and conditional access Compute: AKS isolation using namespaces, node pools, or dedicated clusters Data: Per‑tenant storage, databases, and vector indexes 7. Identity & access management (Microsoft Entra ID) Key IAM practices include: Central Microsoft Entra ID tenant for authentication and authorization Application and workload identities using managed identities Tenant context enforced at API Management and propagated downstream Conditional Access and least‑privilege RBAC This ensures zero‑trust access while supporting both internal and partner scenarios. 8. Secure traffic flow (end‑to‑end) User accesses application via Application Gateway + WAF Traffic inspected and routed through Azure Firewall API Management validates identity, quotas, and tenant context AKS workloads invoke AI services over Private Link Responses return through the same governed path This pattern provides full auditability, threat protection, and policy enforcement. 9. AKS multitenancy options Model When to use Characteristics Namespace per tenant Default Cost‑efficient, logical isolation Dedicated node pools Medium isolation Reduced noisy‑neighbor risk Dedicated AKS cluster High compliance Maximum isolation, higher cost Enterprises typically adopt a tiered approach, choosing the isolation level per tenant based on regulatory and risk requirements. 10. Cost management & chargeback model Tagging strategy (mandatory) tenantId costCenter application environment owner Enforced via Azure Policy across all subscriptions. Chargeback approach Dedicated spoke resources: Direct attribution via subscription and tags Shared hub resources: Allocated using usage telemetry o API calls and token usage from API Management o CPU/memory usage from AKS namespaces Cost data is exported to Azure Cost Management and visualized using Power BI to support showback and chargeback. 11. Security controls checklist Private endpoints for AI services, storage, and search No public network access for sensitive services Azure Firewall for centralized egress and inspection WAF for OWASP protection Azure Policy for governance and compliance 12. Deployment & automation Foundation: Azure Landing Zone accelerators (Bicep or Terraform) Workloads: Modular IaC for hub and spokes AKS apps: GitOps (Flux or Argo CD) Observability: Policy‑driven diagnostics and centralized logging 13. Final thoughts This Azure AI Landing Zone design provides a repeatable, secure, and enterprise‑ready foundation for any large customer adopting AI at scale. By combining: Hub‑and‑Spoke networking AKS‑based AI agents Strong tenant isolation FinOps‑ready chargeback Azure Landing Zone best practices organizations can confidently move AI workloads from experimentation to production—without sacrificing security, governance, or cost transparency. Disclaimer: While the above article discusses hosting custom agents on AKS alongside customer-developed application logic, the following sections focus on a baseline deployment model with no customizations. This approach uses Azure AI Foundry, where models and agents are fully managed by Azure, with centrally governed LLMs(AI Hub) hosted in Azure AI Foundry and agents deployed in a spoke environment. 🚀 Get Started: Building a Secure & Scalable Azure AI Platform To help you accelerate your Azure AI journey, Microsoft and the community provide several reference architectures, solution accelerators, and best-practice guides. Together, these form a strong foundation for designing secure, governed, and cost-efficient GenAI and AI workloads at scale. Below is a recommended starting path. 1️⃣ AI Landing Zone (Foundation) Purpose: Establish a secure, enterprise-ready foundation for AI workloads. The AI Landing Zone extends the standard Azure Landing Zone with AI-specific considerations such as: Network isolation and hub-spoke design Identity and access control for AI services Secure connectivity to data sources Alignment with enterprise governance and compliance 🔗 AI Landing Zone (GitHub): https://github.com/Azure/AI-Landing-Zones?tab=readme-ov-file 👉 Start here if you want a standardized baseline before onboarding any AI workloads. 2️⃣ AI Hub Gateway – Solution Accelerator Purpose: Centralize and control access to AI services across multiple teams or customers. The AI Hub Gateway Solution Accelerator helps you: Expose AI capabilities (models, agents, APIs) via a centralized gateway Apply consistent security, routing, and traffic controls Support both Chat UI and API-based consumption Enable multi-team or multi-tenant AI usage patterns 🔗 AI Hub Gateway Solution Accelerator: https://github.com/mohamedsaif/ai-hub-gateway-landing-zone?tab=readme-ov-file 👉 Ideal when you want a shared AI platform with controlled access and visibility. 3️⃣ Citadel Governance Hub (Advanced Governance) Purpose: Enforce strong governance, compliance, and guardrails for AI usage. The Citadel Governance Hub builds on top of the AI Hub Gateway and focuses on: Policy enforcement for AI usage Centralized governance controls Secure onboarding of teams and workloads Alignment with enterprise risk and compliance requirements 🔗 Citadel Governance Hub (README): https://github.com/Azure-Samples/ai-hub-gateway-solution-accelerator/blob/citadel-v1/README.md 👉 Recommended for regulated environments or large enterprises with strict governance needs. 4️⃣ AKS Cost Analysis (Operational Excellence) Purpose: Understand and optimize the cost of running AI workloads on AKS. AI platforms often rely on AKS for agents, inference services, and gateways. This guide explains: How AKS costs are calculated How to analyze node, pod, and workload costs Techniques to optimize cluster spend 🔗 AKS Cost Analysis: https://learn.microsoft.com/en-us/azure/aks/cost-analysis 👉 Use this early to avoid unexpected cost overruns as AI usage scales. 5️⃣ AKS Multi-Tenancy & Cluster Isolation Purpose: Safely run workloads for multiple teams or customers on AKS. This guidance covers: Namespace vs cluster isolation strategies Security and blast-radius considerations When to use shared clusters vs dedicated clusters Best practices for multi-tenant AKS platforms 🔗 AKS Multi-Tenancy & Cluster Isolation: https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-cluster-isolation 👉 Critical reading if your AI platform supports multiple teams, business units, or customers. 🧭 Suggested Learning Path If you’re new, follow this order: AI Landing Zone → build the foundation AI Hub Gateway → centralize AI access Citadel Governance Hub → enforce guardrails AKS Cost Analysis → control spend AKS Multi-Tenancy → scale securely
VimalVerma
Feb 03, 2026 Place Azure Architecture Blog
3.2KViews
2likes
0Comments
Deploy PostgreSQL on Azure VMs with Azure NetApp Files: Production-Ready Infrastructure as Code
PostgreSQL is a popular open‑source cloud database for modern web applications and AI/ML workloads, and deploying it on Azure VMs with high‑performance storage should be simple. In practice, however, using Azure NetApp Files requires many coordinated steps—from provisioning networking and storage to configuring NFS, installing and initializing PostgreSQL, and maintaining consistent, secure, and high‑performance environments across development, test, and production. To address this complexity, we’ve built production‑ready Infrastructure as Code templates that fully automate the deployment, from infrastructure setup to database initialization, ensuring PostgreSQL runs on high‑performance Azure NetApp Files storage from day one.
GeertVanTeylingen
Jan 15, 2026 Place Azure Architecture Blog
576Views
1like
0Comments
What's New with Azure NetApp Files VS Code Extension
The latest update to the Azure NetApp Files (ANF) VS Code Extension introduces powerful enhancements designed to simplify cloud storage management for developers. From multi-tenant support to intuitive right-click mounting and AI-powered commands, this release focuses on improving productivity and streamlining workflows within Visual Studio Code. Explore the new features, learn how they accelerate development, and see why this extension is becoming an essential tool for cloud-native applications.
GeertVanTeylingen
Jan 15, 2026 Place Azure Architecture Blog
287Views
0likes
0Comments
Streamline Azure NetApp Files Management—Right from Your IDE
The Azure NetApp Files VS Code Extension is designed to streamline storage provisioning and management directly within the developer’s IDE. Traditional workflows often require extensive portal navigation, manual configuration, and policy management, leading to inefficiencies and context switching. The extension addresses these challenges by enabling AI-powered automation through natural language commands, reducing provisioning time from hours to minutes while minimizing errors and improving compliance. Key capabilities include generating production-ready ARM templates, validating resources, and delivering optimization insights—all without leaving the coding environment.
GeertVanTeylingen
Dec 15, 2025 Place Azure Architecture Blog
261Views
0likes
0Comments