<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>Azure Infrastructure Blog articles</title>
    <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/bg-p/AzureInfrastructureBlog</link>
    <description>Azure Infrastructure Blog articles</description>
    <pubDate>Fri, 17 Apr 2026 18:23:18 GMT</pubDate>
    <dc:creator>AzureInfrastructureBlog</dc:creator>
    <dc:date>2026-04-17T18:23:18Z</dc:date>
    <item>
      <title>Qurious About Quantum?</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/qurious-about-quantum/ba-p/4510963</link>
      <description>&lt;H1&gt;&lt;STRONG&gt;Why IT Should Care About Quantum Right Now – It’s Not Just A Research Problem&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P&gt;Today on April 14th, 2026 we celebrate World Quantum Day! There has been a noticeable rise in the topic of quantum computing appearing across technical conferences, business news, and strategic planning conversations. But what does this have to do with the IT pros and infrastructure architects driving today's technology decisions - isn't the focus on AI and agents? As it turns out, we may be in a state of &lt;A href="https://news.microsoft.com/source/features/innovation/quantum-computing-10-terms-to-know/" target="_blank" rel="noopener"&gt;superposition&lt;/A&gt;&amp;nbsp;.&lt;/P&gt;
&lt;P&gt;The last few years have produced genuine quantum breakthroughs, &lt;A href="https://azure.microsoft.com/en-us/blog/quantum/2025/02/19/microsoft-unveils-majorana-1-the-worlds-first-quantum-processor-powered-by-topological-qubits/?msockid=0ca08ed6ee6b69aa37f89d00ef826849" target="_blank" rel="noopener"&gt;including our Majorana 1 topological quantum processor&lt;/A&gt; and &lt;A href="https://qunorth.com/news/eifo-and-the-novo-nordisk-foundation-acquire-the-worlds-most-powerful-quantum-computer/" target="_blank" rel="noopener"&gt;partnership with QuNorth&lt;/A&gt; whose Magne system will be the world’s first commercially available &lt;A href="https://quantum.microsoft.com/en-us/insights/education/concepts/quantum-computing-implementation-levels" target="_blank" rel="noopener"&gt;level 2 quantum computer&lt;/A&gt;. These milestones represent a meaningful shift from theoretical promise to engineering reality. But quantum computing will not exist in isolation – it will require deep integration with classical computing, traditional systems and AI infrastructure. If you build or manage systems, familiarity with what you already know will get you further in understanding quantum than you might expect.&lt;/P&gt;
&lt;P&gt;This lecture series does not deal in hype (quantum has enough of that). It’s designed to give IT pros, developers, researchers, and anyone curious about quantum a grounded understanding of what it will take to architect, scale, and run quantum systems in the real world. It offers clarity on what quantum computers will actually be good at, how to size one for a real problem, and how performance optimization techniques you already know from classical HPC can translate to quantum. So, whether you can derive a Hamiltonian in your sleep or have quietly searched "what is a qubit" more than once, no physics PhD is required.&lt;/P&gt;
&lt;H2&gt;&lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=AmK5RqfpAEE" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Lecture 1: Utility Scale Quantum Applications &lt;/STRONG&gt;&lt;/A&gt;&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Where quantum shines and where classical still rules&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Before diving into quantum computing, it helps to understand what it is actually useful for. The lecture opens with a deceptively simple question - when does a quantum computer beat a classical one and by how much does it need to beat it to actually matter? The answer involves computational complexity, scaling laws, and a reality check that will reframe how you think about quantum advantage. Spoiler alert: a quadratic speedup sounds impressive until you run the numbers and discover the crossover time is somewhere in the neighborhood of the age of the universe (not useful!). The problems that do survive this filter - chemistry, materials science, and biochemistry - turn out to be the ones that matter most for the future of fields like manufacturing, energy, medicine, and climate.&lt;/P&gt;
&lt;P&gt;The lecture closes with an exciting vision still grounded in today's technology — using quantum computers to teach quantum physics to AI, combining the accuracy of quantum simulation with the speed of inference to unlock a new generation of materials and molecules. &lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=AmK5RqfpAEE" target="_blank" rel="noopener"&gt;Watch Lecture 1&lt;/A&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;&lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=SKMXsdCWpzY" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Lecture 2: Utility Scale Quantum Architecture&lt;/STRONG&gt;&lt;/A&gt;&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;You already know more than you think!&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;A fun fact about quantum computing architecture is that the abacus and today's fastest GPU operate on the same fundamental principle. Quantum computers are where that 4,500-year streak finally ends, but the architecture built around them will look surprisingly familiar. This lecture covers how a quantum computer can easily fit into a cloud data center, and not as some exotic standalone machine, but as a complementary computational accelerator in the stack sitting alongside CPUs, GPUs, and FPGAs. From there the lecture covers the full software stack from high-level application code all the way down to the control pulses sent to physical qubits — and the parallels to classical compilation, optimization, and execution are striking at every layer.&lt;/P&gt;
&lt;P&gt;And a prediction worth sitting with if you aren’t a quantum developer - by the time utility-scale quantum computers exist, most of us will be programming them in natural language with tools like Copilot. As it turns out vibe coding has a quantum future. &lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=SKMXsdCWpzY" target="_blank" rel="noopener"&gt;Watch Lecture 2&lt;/A&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;&lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=E75MV-vBIIM" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Lecture 3: Utility Scale Quantum Resource Estimation &lt;/STRONG&gt;&lt;/A&gt;&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Sizing a quantum computer for real problems&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Capacity planning for a quantum computer is one of the most consequential decisions in building one. Resource estimation is the quantum equivalent of sizing a classical HPC workload. The worked example of estimating the quantum resources required to simulate a ruthenium-based carbon fixation catalyst highlights the importance of the topic (no we won’t spoil the answer). Along the way you will meet magic state distillation, a wonderfully named process for producing high-fidelity quantum gate operations from noisy physical qubits. And if that isn’t enough excitement, explore why the tradeoff between qubit quality and qubit quantity is one of the most consequential engineering decisions in the field.&lt;/P&gt;
&lt;P&gt;For chip architects, infrastructure designers, and anyone who loves going deep on system tradeoffs this one is for you. &lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=E75MV-vBIIM" target="_blank" rel="noopener"&gt;Watch Lecture 3&lt;/A&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;&lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=Nv-VydKbnEU&amp;amp;t=1s" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Lecture 4: High Performance Quantum Computing&lt;/STRONG&gt;&lt;/A&gt;&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;BLAS, MPI, NUMA - meet your quantum counterparts&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;This is where HPC engineers will feel most at home, and maybe most surprised. The core challenges of high-performance quantum computing are not new problems. They are the same problems classical computing solved over decades, showing up again in a new context. Every one of these concepts - instruction-level parallelism, optimized kernel libraries, autotuning, message passing, non-uniform memory access - has a direct quantum analog covered with concrete examples. The quantum version of MPI even reuses nearly the entire standard. Just two new commands were added to handle something classical systems never had to worry about - you cannot copy a quantum state.&lt;/P&gt;
&lt;P&gt;If you have ever tuned a distributed workload or wrestled with NUMA topology, this lecture will show you exactly where your existing expertise carries directly into quantum and open up a whole new context to apply it. &lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=Nv-VydKbnEU&amp;amp;t=1s" target="_blank" rel="noopener"&gt;Watch Lecture 4&lt;/A&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;&lt;A class="lia-external-url" href="https://quantum.microsoft.com/en-us/insights/industry-insights/quantum-architecture-series" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Follow along for more lectures&lt;/STRONG&gt;&lt;/A&gt;&lt;/H2&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://www.youtube.com/watch?v=2glCGJRtGxc" target="_blank" rel="noopener"&gt;Lecture 5 covering "Trade‑offs on the Path to Utility Scale"&lt;/A&gt; was published this morning as part of World Quantum Day covering multiple challenges that must be overcome to achieve utility-scale quantum computing. Dr. Troyer and the Microsoft Quantum team will continue expanding this series with new lectures covering what it takes to build and operate at quantum scale. Future topics include scalable quantum architecture, balancing the cost of utility-scale quantum computing, quantum simulations of chemical reactions, and responsible quantum computing.&lt;/P&gt;
&lt;P&gt;Follow the series at &lt;A href="https://quantum.microsoft.com/en-us/insights/industry-insights/quantum-architecture-series" target="_blank" rel="noopener"&gt;https://quantum.microsoft.com/en-us/insights/industry-insights/quantum-architecture-series&lt;/A&gt; and we'd love to hear from you. Leave a comment below with the quantum topics you'd like to see covered next!&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2026 17:14:06 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/qurious-about-quantum/ba-p/4510963</guid>
      <dc:creator>JohnGruszczyk</dc:creator>
      <dc:date>2026-04-14T17:14:06Z</dc:date>
    </item>
    <item>
      <title>Service Mesh-Aware Request Tracing in AKS with Istio and Application Insights</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/service-mesh-aware-request-tracing-in-aks-with-istio-and/ba-p/4509928</link>
      <description>&lt;H1&gt;Introduction&lt;/H1&gt;
&lt;P&gt;As platforms evolve toward microservice‑based architectures, observability becomes more complex than ever. In Azure Kubernetes Service (AKS), teams often rely on Istio to manage service‑to‑service communication and Azure Application Insights for application‑level telemetry.&lt;/P&gt;
&lt;P&gt;While both are powerful, they operate at different layers and without deliberate configuration, correlating a single request across the service mesh and the application layer is not straightforward.&lt;/P&gt;
&lt;P&gt;This blog walks through a practical, production‑ready solution to enable Istio (Envoy) access logging in AKS and correlate those logs with Application Insights telemetry, allowing engineers to trace a request end‑to‑end for faster troubleshooting and deeper visibility.&lt;/P&gt;
&lt;H2&gt;Platform Observability Context&lt;/H2&gt;
&lt;P&gt;The environment consists of:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;AKS with managed Istio enabled&lt;/LI&gt;
&lt;LI&gt;Envoy sidecars injected into application pods&lt;/LI&gt;
&lt;LI&gt;Azure Application Insights SDK running inside workloads&lt;/LI&gt;
&lt;LI&gt;Log Analytics as the centralized log store&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Istio is responsible for traffic management, while Application Insights captures application‑level telemetry. The goal was to &lt;STRONG&gt;align these layers using a common trace context&lt;/STRONG&gt;, without introducing additional tracing systems or custom agents.&lt;/P&gt;
&lt;H2&gt;Enabling Istio Access Logging at the Mesh Level&lt;/H2&gt;
&lt;P&gt;The first step is to ensure that Envoy access logs are emitted consistently across the service mesh. Istio provides the &lt;STRONG&gt;Telemetry API&lt;/STRONG&gt;, which allows access logging to be enabled centrally without modifying individual workloads.&lt;/P&gt;
&lt;P&gt;Apply a Telemetry resource in the Istio system namespace to enable Envoy access logging:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;apiVersion: telemetry.istio.io/v1
kind: Telemetry
metadata:
  name: mesh-access-logs
  namespace: aks-istio-system
spec:
  accessLogging:
  - providers:
    - name: envoy&lt;/LI-CODE&gt;
&lt;P&gt;This configuration ensures that:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;All Envoy sidecars emit access logs&lt;/LI&gt;
&lt;LI&gt;Logging behavior is uniform across the mesh&lt;/LI&gt;
&lt;LI&gt;The setup remains compatible with AKS managed Istio&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Standardizing Envoy Logs Using EnvoyFilter&lt;/H4&gt;
&lt;P&gt;Access logs must be structured to be useful at scale. In AKS managed Istio, direct Envoy configuration is restricted, so &lt;STRONG&gt;EnvoyFilter&lt;/STRONG&gt; is used to customize logging behavior.&lt;/P&gt;
&lt;P&gt;EnvoyFilters are configured to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Emit logs in &lt;STRONG&gt;structured JSON format&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Write logs to /dev/stdout&lt;/LI&gt;
&lt;LI&gt;Include trace and request correlation headers&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;To achieve full visibility, separate EnvoyFilters are applied for &lt;STRONG&gt;inbound&lt;/STRONG&gt; and &lt;STRONG&gt;outbound&lt;/STRONG&gt; sidecar traffic.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: json-access-logs
  namespace: aks-istio-system
spec:
  configPatches:
  - applyTo: NETWORK_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
    patch:
      operation: MERGE
      value:
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          access_log:
          - name: envoy.access_loggers.file
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
              path: /dev/stdout
              log_format:
                json_format:
                  timestamp: "%START_TIME%"
                  method: "%REQ(:METHOD)%"
                  path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
                  response_code: "%RESPONSE_CODE%"
                  response_flags: "%RESPONSE_FLAGS%"
                  duration_ms: "%DURATION%"
                  downstream_remote_address: "%DOWNSTREAM_REMOTE_ADDRESS%"
                  x_request_id: "%REQ(X-REQUEST-ID)%"
                  traceparent: "%REQ(TRACEPARENT)%"
                  tracestate: "%REQ(TRACESTATE)%"
                  x_b3_traceid: "%REQ(X-B3-TRACEID)%"&lt;/LI-CODE&gt;
&lt;P&gt;This configuration ensures inbound traffic logs contain both request metadata and correlation identifiers.&lt;/P&gt;
&lt;H4&gt;Configuring Outbound Envoy Access Logs&lt;/H4&gt;
&lt;P&gt;Outbound logging is required to observe downstream calls made by a service. Apply a second EnvoyFilter for outbound traffic:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: json-access-logs-outbound
  namespace: aks-istio-system
spec:
  configPatches:
  - applyTo: NETWORK_FILTER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
    patch:
      operation: MERGE
      value:
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          access_log:
          - name: envoy.access_loggers.file
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
              path: /dev/stdout
              log_format:
                json_format:
                  timestamp: "%START_TIME%"
                  method: "%REQ(:METHOD)%"
                  path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
                  response_code: "%RESPONSE_CODE%"
                  response_flags: "%RESPONSE_FLAGS%"
                  duration_ms: "%DURATION%"
                  downstream_remote_address: "%DOWNSTREAM_REMOTE_ADDRESS%"
                  x_request_id: "%REQ(X-REQUEST-ID)%"
                  traceparent: "%REQ(TRACEPARENT)%"
                  tracestate: "%REQ(TRACESTATE)%"
                  x_b3_traceid: "%REQ(X-B3-TRACEID)%"&lt;/LI-CODE&gt;
&lt;P&gt;Inbound and outbound logs now follow the same schema, enabling consistent querying and analysis.&lt;/P&gt;
&lt;H4&gt;Automating the Configuration with PowerShell&lt;/H4&gt;
&lt;P&gt;To standardize and repeat the setup across environments, wrap the configuration in a PowerShell script. The script should:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Validate the Istio system namespace&lt;/LI&gt;
&lt;LI&gt;Apply the Telemetry resource&lt;/LI&gt;
&lt;LI&gt;Apply inbound and outbound EnvoyFilters&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang=""&gt;$MeshRootNamespace = "aks-istio-system"
$TelemetryName    = "mesh-access-logs"
$EnvoyFilterName  = "json-access-logs"

kubectl get ns $MeshRootNamespace --ignore-not-found

$telemetryYaml | kubectl apply -f -
$envoyFilterYaml | kubectl apply -f -
$envoyFilterOutboundYaml | kubectl apply -f -&lt;/LI-CODE&gt;
&lt;H2&gt;Log Ingestion into Azure Monitor&lt;/H2&gt;
&lt;P&gt;Because Envoy access logs are written to standard output:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;AKS automatically collects them&lt;/LI&gt;
&lt;LI&gt;Logs are ingested into &lt;STRONG&gt;Log Analytics&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Data appears in the ContainerLogV2 table&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;No additional agents or custom log pipelines are required.&lt;/P&gt;
&lt;H2&gt;Aligning with Application Insights Telemetry&lt;/H2&gt;
&lt;P&gt;Application Insights uses &lt;STRONG&gt;W3C Trace Context&lt;/STRONG&gt;, where the operation_Id represents the trace identifier. Since Envoy access logs capture the traceparent header, both systems expose the same trace ID.&lt;/P&gt;
&lt;P&gt;This alignment allows service mesh logs and application telemetry to be correlated without changing application code.&lt;/P&gt;
&lt;H4&gt;Correlating Requests Using KQL&lt;/H4&gt;
&lt;P&gt;To analyze request flow:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Parse JSON access logs from ContainerLogV2&lt;/LI&gt;
&lt;LI&gt;Extract the trace ID from traceparent&lt;/LI&gt;
&lt;LI&gt;Join with Application Insights request telemetry&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;To validate end‑to‑end tracing, use &lt;STRONG&gt;Log Analytics&lt;/STRONG&gt; to query Istio access logs collected in the ContainerLogV2 table. Since Envoy access logs include the traceparent header, the trace‑id embedded in it directly maps to the &lt;STRONG&gt;Application Insights operation_Id&lt;/STRONG&gt;. By filtering istio-proxy logs on this trace‑id, it becomes possible to view the full Envoy request record for a specific application request and trace it across the service mesh and application layers.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;KQL (filter Istio access logs using an Application Insights operation_Id)&lt;/P&gt;
&lt;LI-CODE lang=""&gt;let operationId = "&amp;lt;OperationID&amp;gt;"; // Replace with your actual operation_Id
ContainerLogV2
| where TimeGenerated &amp;gt;= ago(24h)
| where ContainerName == "istio-proxy"
| where LogSource == "stdout"
| where LogMessage startswith "{"
| extend AccessLog = parse_json(LogMessage)
| extend ExtractedOperationId = extract(@"00-([a-f0-9]{32})-", 1, tostring(AccessLog.traceparent))
| where ExtractedOperationId == operationId
| project 
    TimeGenerated,
    PodName,
    Method = tostring(AccessLog.method),
    Path = tostring(AccessLog.path),
    ResponseCode = toint(AccessLog.response_code),
    RequestId = tostring(AccessLog.x_request_id),
    TraceParent = tostring(AccessLog.traceparent),
    TraceState = tostring(AccessLog.tracestate),
    Authority = tostring(AccessLog.authority),
    RawLogMessage = LogMessage
| order by TimeGenerated asc&lt;/LI-CODE&gt;
&lt;H2&gt;Closing Thoughts&lt;/H2&gt;
&lt;P&gt;End‑to‑end request tracing in AKS is achieved by aligning &lt;STRONG&gt;service mesh logging and application telemetry around shared standards&lt;/STRONG&gt;. By enabling structured Istio access logs and correlating them with Application Insights, platforms gain clear visibility into request flow across networking and application layers using Azure‑native tools.&lt;/P&gt;
&lt;P&gt;This process scales well in managed Istio environments and provides meaningful observability without adding platform complexity.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 16:37:58 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/service-mesh-aware-request-tracing-in-aks-with-istio-and/ba-p/4509928</guid>
      <dc:creator>Siddhi_Singh</dc:creator>
      <dc:date>2026-04-10T16:37:58Z</dc:date>
    </item>
    <item>
      <title>Building an End-to-End MLOps Pipeline: From Training to Managed Endpoints on Azure</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-an-end-to-end-mlops-pipeline-from-training-to-managed/ba-p/4509852</link>
      <description>&lt;H2 data-line="4"&gt;Introduction&lt;/H2&gt;
&lt;P data-line="6"&gt;Machine learning models are only as valuable as the infrastructure that supports them. A model trained in a Jupyter notebook and saved to a shared folder creates a chain of problems: no versioning, no reproducibility, no clear ownership, and no automated path to production. When the data scientist who trained it goes on vacation, nobody knows how to retrain it or where the latest version lives.&lt;/P&gt;
&lt;P data-line="8"&gt;A well-designed MLOps pipeline solves all of this. It makes training repeatable, artifacts versioned, and deployment automated — so that the path from code change to live endpoint is a single merge to main.&lt;/P&gt;
&lt;P data-line="10"&gt;This post provides a&amp;nbsp;&lt;STRONG&gt;generic, end-to-end pattern&lt;/STRONG&gt;&amp;nbsp;covering the full lifecycle:&lt;/P&gt;
&lt;OL data-line="12"&gt;
&lt;LI data-line="12"&gt;&lt;STRONG&gt;Train&lt;/STRONG&gt;&amp;nbsp;a scikit-learn model against data in Azure Blob Storage&lt;/LI&gt;
&lt;LI data-line="13"&gt;&lt;STRONG&gt;Serialize&lt;/STRONG&gt;&amp;nbsp;the model as a self-contained pickle bundle&lt;/LI&gt;
&lt;LI data-line="14"&gt;&lt;STRONG&gt;Register&lt;/STRONG&gt;&amp;nbsp;it in an Azure ML Registry for cross-team discovery&lt;/LI&gt;
&lt;LI data-line="15"&gt;&lt;STRONG&gt;Deploy&lt;/STRONG&gt;&amp;nbsp;it to an Azure ML Managed Online Endpoint for real-time scoring&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="17"&gt;You can adapt this template for any scikit-learn model — classification, regression, clustering, or anomaly detection — by swapping in your own training and scoring scripts.&lt;/P&gt;
&lt;H2 data-line="19"&gt;When to Use This Pattern&lt;/H2&gt;
&lt;P data-line="21"&gt;This pipeline template is a good fit when:&lt;/P&gt;
&lt;UL data-line="23"&gt;
&lt;LI data-line="23"&gt;Your training data lives in Azure Blob Storage (Parquet, CSV, or similar)&lt;/LI&gt;
&lt;LI data-line="24"&gt;You use scikit-learn (or any Python ML framework) for model training&lt;/LI&gt;
&lt;LI data-line="25"&gt;You need versioned model artifacts in a central registry&lt;/LI&gt;
&lt;LI data-line="26"&gt;You want an automated deployment path to a live scoring endpoint&lt;/LI&gt;
&lt;LI data-line="27"&gt;Downstream consumers (scoring pipelines, APIs, dashboards) need a reliable handoff mechanism&lt;/LI&gt;
&lt;LI data-line="28"&gt;You want to eliminate ad-hoc notebook-based training with no versioning or reproducibility&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="30"&gt;It is&amp;nbsp;&lt;STRONG&gt;not&lt;/STRONG&gt; the right fit if you need distributed training (use Azure ML pipelines instead), or if your model requires GPU inference (managed endpoints support GPU, but the config differs from what's shown here).&lt;/P&gt;
&lt;H2 data-line="32"&gt;Architecture Overview&lt;/H2&gt;
&lt;P data-line="34"&gt;The pipeline follows a four-stage flow:&lt;/P&gt;
&lt;P data-line="30"&gt;DevOps Gate → Train &amp;amp; Publish Artifact → Register in ML Registry → Deploy to Managed Endpoint&lt;/P&gt;
&lt;OL data-line="44"&gt;
&lt;LI data-line="40"&gt;&lt;STRONG&gt;DevOps Stage&lt;/STRONG&gt;&amp;nbsp;— A required gate that logs the build number and validates the pipeline is running.&lt;/LI&gt;
&lt;LI data-line="41"&gt;&lt;STRONG&gt;Train Stage&lt;/STRONG&gt;&amp;nbsp;— Installs Python dependencies, runs the training script against data in Azure Blob Storage, and publishes the pickle bundle as a pipeline artifact.&lt;/LI&gt;
&lt;LI data-line="42"&gt;&lt;STRONG&gt;Register Stage&lt;/STRONG&gt;&amp;nbsp;— Downloads the artifact and registers it in an Azure ML Registry with automatic versioning.&lt;/LI&gt;
&lt;LI data-line="43"&gt;&lt;STRONG&gt;Deploy Stage&lt;/STRONG&gt;&amp;nbsp;— Creates (or updates) a Managed Online Endpoint and deploys the newly registered model version to it for real-time scoring.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="49"&gt;The first three stages run on every push to main. The Deploy stage can be gated with a manual approval if you want human review before going live.&lt;/P&gt;
&lt;H2 data-line="51"&gt;The Training Script&lt;/H2&gt;
&lt;P data-line="53"&gt;The training script is the core of this pipeline — everything else is orchestration around it. It's a standalone Python CLI that you should be able to run locally before it ever touches a pipeline.&lt;/P&gt;
&lt;P data-line="55"&gt;The general shape is:&lt;/P&gt;
&lt;OL data-line="57"&gt;
&lt;LI data-line="53"&gt;&lt;STRONG&gt;Load data&lt;/STRONG&gt;&amp;nbsp;from Azure Blob Storage (Parquet, CSV, etc.) using libraries like&amp;nbsp;adlfs&amp;nbsp;and&amp;nbsp;pyarrow.&lt;/LI&gt;
&lt;LI data-line="54"&gt;&lt;STRONG&gt;Validate the schema&lt;/STRONG&gt;&amp;nbsp;— check that expected columns exist, types are correct, and there are enough rows to train on. Fail fast with a clear error message if not.&lt;/LI&gt;
&lt;LI data-line="55"&gt;&lt;STRONG&gt;Engineer features&lt;/STRONG&gt; — compute derived columns, handle missing values, encode categorical. This is where most of the domain-specific logic lives.&lt;/LI&gt;
&lt;LI data-line="56"&gt;&lt;STRONG&gt;Train the model&lt;/STRONG&gt;&amp;nbsp;using scikit-learn (or your framework of choice).&lt;/LI&gt;
&lt;LI data-line="57"&gt;&lt;STRONG&gt;Apply preprocessing&lt;/STRONG&gt;&amp;nbsp;(e.g.,&amp;nbsp;StandardScaler) and save the preprocessor alongside the model so that scoring uses the exact same transformations.&lt;/LI&gt;
&lt;LI data-line="58"&gt;&lt;STRONG&gt;Serialize a bundle&lt;/STRONG&gt;&amp;nbsp;containing the model, preprocessor, feature column order, and training metadata into a single pickle file.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="64"&gt;The script reads storage credentials from environment variables, keeping secrets out of the codebase entirely. It accepts an --output-path argument and writes the serialized bundle to that location — which the pipeline later publishes as an artifact.&lt;/P&gt;
&lt;H3 data-line="66"&gt;What Goes in the Bundle&lt;/H3&gt;
&lt;P data-line="68"&gt;The pickle file isn't just the model — it's a&amp;nbsp;&lt;STRONG&gt;self-contained scoring contract&lt;/STRONG&gt;. Here's what's inside and why:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Key&lt;/th&gt;&lt;th&gt;Type&lt;/th&gt;&lt;th&gt;Purpose&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;model&lt;/td&gt;&lt;td&gt;scikit-learn estimator&lt;/td&gt;&lt;td&gt;The trained model (e.g.,&amp;nbsp;IsolationForest,&amp;nbsp;RandomForestClassifier)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;scaler&lt;/td&gt;&lt;td&gt;StandardScaler&amp;nbsp;(or similar)&lt;/td&gt;&lt;td&gt;The exact preprocessor fitted on training data — scoring must use the same transform&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;feature_order&lt;/td&gt;&lt;td&gt;list[str]&lt;/td&gt;&lt;td&gt;Column names in the exact order the model expects — prevents silent column reordering bugs&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;metadata.trained_at&lt;/td&gt;&lt;td&gt;ISO timestamp&lt;/td&gt;&lt;td&gt;When the model was trained — useful for debugging stale predictions&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;metadata.source_rows&lt;/td&gt;&lt;td&gt;int&lt;/td&gt;&lt;td&gt;How many rows were in the raw data — helps detect data pipeline issues&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;metadata.clean_rows&lt;/td&gt;&lt;td&gt;int&lt;/td&gt;&lt;td&gt;How many rows survived cleaning — a sudden drop signals a data quality problem&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;metadata.scikit_learn_version&lt;/td&gt;&lt;td&gt;str&lt;/td&gt;&lt;td&gt;The scikit-learn version used — pickle compatibility can break across major versions&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="80"&gt;This structure means any consumer can load the bundle, inspect what's in it, and score new data without knowing anything about how the model was trained.&lt;/P&gt;
&lt;H2 data-line="82"&gt;Choosing a Serialization Format&lt;/H2&gt;
&lt;P data-line="84"&gt;This template uses&amp;nbsp;&lt;STRONG&gt;pickle&lt;/STRONG&gt;, but you should choose based on your needs:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Format&lt;/th&gt;&lt;th&gt;Best For&lt;/th&gt;&lt;th&gt;Trade-off&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;pickle&lt;/td&gt;&lt;td&gt;Bundles with metadata (model + scaler + feature order + config)&lt;/td&gt;&lt;td&gt;Built-in, no extra deps. Not safe to load from untrusted sources.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;joblib&lt;/td&gt;&lt;td&gt;Large NumPy array-heavy models&lt;/td&gt;&lt;td&gt;Faster for large arrays, but adds a dependency.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ONNX&lt;/td&gt;&lt;td&gt;Cross-framework interop (PyTorch ↔ scikit-learn)&lt;/td&gt;&lt;td&gt;Portable, but not all model types are supported.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="92"&gt;Pickle works well when your artifact is a&amp;nbsp;&lt;STRONG&gt;self-contained bundle&lt;/STRONG&gt; — model, preprocessor, feature column order, and training metadata in one file. Any consumer who loads it gets everything needed to score new data correctly.&lt;/P&gt;
&lt;P data-line="92"&gt;&lt;STRONG&gt;Security note:&lt;/STRONG&gt; Never load pickle files from untrusted sources — deserialization can execute arbitrary code. This is safe when the pickle is produced by your own pipeline and stored in an access-controlled registry, but always validate provenance.&lt;/P&gt;
&lt;H2 data-line="96"&gt;The Pipeline YAML&lt;/H2&gt;
&lt;P data-line="98"&gt;Here's the full pipeline template. Replace &amp;lt;your-...&amp;gt; placeholders with your values:&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;trigger:
  branches:
    include:
      - main
  paths:
    include:
      - &amp;lt;your-model-source-path&amp;gt;/*       # e.g., src/models/anomaly-detection/*

stages:
  - stage: DevOps
    displayName: Required DevOps Stage
    jobs:
      - job: Echo
        steps:
          - script: echo build initiated - $(Build.BuildNumber)

  - stage: Train
    dependsOn: DevOps
    displayName: 'Train Model &amp;amp; Publish Artifact'
    jobs:
      - job: TrainModel
        steps:
          - checkout: self

          - task: UsePythonVersion@0
            inputs:
              versionSpec: '3.12'         # Use a supported Python version

          - script: |
              python -m pip install --upgrade pip
              pip install -r requirements.txt
            displayName: 'Install Python dependencies'

          - script: |
              python &amp;lt;your-training-script&amp;gt;.py \
                --output-path "$(Build.ArtifactStagingDirectory)/model_bundle.pkl"
            displayName: 'Train model'
            env:
              AZURE_STORAGE_ACCOUNT_NAME: $(AZURE_STORAGE_ACCOUNT_NAME)
              AZURE_STORAGE_ACCOUNT_KEY: $(AZURE_STORAGE_ACCOUNT_KEY)   # See note on Managed Identity below

          - task: PublishPipelineArtifact@1                              # Use the modern task
            inputs:
              artifactName: 'model-pkl'
              targetPath: '$(Build.ArtifactStagingDirectory)/model_bundle.pkl'

  - stage: Register
    dependsOn: Train
    displayName: 'Register Model in ML Registry'
    jobs:
      - job: RegisterModel
        steps:
          - task: DownloadPipelineArtifact@2                            # Use the modern task
            inputs:
              artifactName: 'model-pkl'
              targetPath: '$(System.ArtifactsDirectory)/model-pkl'

          - task: AzureCLI@2
            displayName: 'Register model in ML Registry'
            inputs:
              azureSubscription: '&amp;lt;your-service-connection&amp;gt;'
              scriptType: 'ps'
              scriptLocation: 'inlineScript'
              inlineScript: |
                az extension add -n ml --yes
                az ml model create `
                  --name &amp;lt;your-model-name&amp;gt; `
                  --path "$(System.ArtifactsDirectory)/model-pkl/model_bundle.pkl" `
                  --type custom_model `
                  --registry-name &amp;lt;your-ml-registry&amp;gt; `
                  --resource-group &amp;lt;your-resource-group&amp;gt;&lt;/LI-CODE&gt;
&lt;H3 data-line="174"&gt;Placeholder Reference&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Placeholder&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;th&gt;Example&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&amp;lt;your-model-source-path&amp;gt;&lt;/td&gt;&lt;td&gt;Path to your model code in the repo&lt;/td&gt;&lt;td&gt;src/models/anomaly-detection&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;lt;your-training-script&amp;gt;&lt;/td&gt;&lt;td&gt;Your Python training script&lt;/td&gt;&lt;td&gt;train_model.py&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;lt;your-service-connection&amp;gt;&lt;/td&gt;&lt;td&gt;Azure DevOps service connection name&lt;/td&gt;&lt;td&gt;prod-ml-connection&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;lt;your-model-name&amp;gt;&lt;/td&gt;&lt;td&gt;Name for the model in the registry&lt;/td&gt;&lt;td&gt;sales-anomaly-detector&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;lt;your-ml-registry&amp;gt;&lt;/td&gt;&lt;td&gt;Azure ML Registry name&lt;/td&gt;&lt;td&gt;contoso-ml-registry&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;lt;your-resource-group&amp;gt;&lt;/td&gt;&lt;td&gt;Resource group containing the registry&lt;/td&gt;&lt;td&gt;rg-ml-prod&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-line="185"&gt;Key Design Decisions&lt;/H3&gt;
&lt;P data-line="187"&gt;&lt;STRONG&gt;Credentials as environment variables&lt;/STRONG&gt;&amp;nbsp;— Storage credentials are stored in an Azure DevOps variable group and injected via the&amp;nbsp;env:&amp;nbsp;block. They never appear on the command line or in logs.&lt;/P&gt;
&lt;P data-line="189"&gt;&lt;STRONG&gt;Prefer Managed Identity over keys.&lt;/STRONG&gt;&amp;nbsp;The template above shows&amp;nbsp;AZURE_STORAGE_ACCOUNT_KEY&amp;nbsp;for simplicity, but the recommended approach is to authenticate using a User Managed Identity (UMI) with the&amp;nbsp;Storage Blob Data Reader&amp;nbsp;role. This eliminates key rotation and reduces the credential surface. If your agent supports Managed Identity (e.g., self-hosted on an Azure VM), use&amp;nbsp;DefaultAzureCredential&amp;nbsp;in your training script instead of account keys.&lt;/P&gt;
&lt;P data-line="191"&gt;&lt;STRONG&gt;Separate Train and Register stages&lt;/STRONG&gt;&amp;nbsp;— The training artifact is published as a pipeline artifact between stages. This means if registration fails, you don't have to retrain. It also gives you a downloadable artifact in Azure DevOps for debugging.&lt;/P&gt;
&lt;P data-line="193"&gt;&lt;STRONG&gt;az ml model create&amp;nbsp;with&amp;nbsp;--registry-name&lt;/STRONG&gt;&amp;nbsp;— This registers the model in an Azure ML Registry (not a workspace). Registries are shared across workspaces and teams, making the model accessible to anyone with the right permissions.&lt;/P&gt;
&lt;P data-line="195"&gt;&lt;STRONG&gt;Auto-versioning&lt;/STRONG&gt; — Each&amp;nbsp;az ml model create&amp;nbsp;call with the same&amp;nbsp;--name&amp;nbsp;automatically increments the version number in the registry. No manual version management needed.&lt;/P&gt;
&lt;H2 data-line="197"&gt;Permissions&lt;/H2&gt;
&lt;P data-line="199"&gt;The pipeline authenticates using a User Managed Identity (UMI) linked to an Azure DevOps service connection via workload identity federation. The UMI needs:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Role&lt;/th&gt;&lt;th&gt;Scope&lt;/th&gt;&lt;th&gt;Purpose&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Storage Blob Data Reader&lt;/td&gt;&lt;td&gt;Storage account or container&lt;/td&gt;&lt;td&gt;Read training data&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AzureML Registry User&lt;/td&gt;&lt;td&gt;ML Registry&lt;/td&gt;&lt;td&gt;Register model artifacts&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AzureML Data Scientist&lt;/td&gt;&lt;td&gt;ML Workspace&lt;/td&gt;&lt;td&gt;Create/update managed endpoints and deployments&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="207"&gt;No Contributor or Owner access at the subscription or resource group level is required. Least-privilege access keeps the blast radius small.&lt;/P&gt;
&lt;P data-line="207"&gt;&lt;STRONG&gt;Workload Identity Federation vs. secrets:&lt;/STRONG&gt; If your Azure DevOps service connection uses workload identity federation (recommended), the UMI authenticates without any stored secrets. If using a service principal with client secret instead, store the secret in an Azure DevOps variable group marked as secret, and rotate it regularly.&lt;/P&gt;
&lt;H2 data-line="211"&gt;Common Pitfalls&lt;/H2&gt;
&lt;P data-line="213"&gt;These are issues you'll likely hit when adapting this template:&lt;/P&gt;
&lt;P data-line="215"&gt;&lt;STRONG&gt;Column name mismatches.&lt;/STRONG&gt;&amp;nbsp;Parquet files may have column names like&amp;nbsp;periodid&amp;nbsp;while your script expects&amp;nbsp;Period ID. Add a case-insensitive column rename mapping in your training script and validate the data schema before training starts.&lt;/P&gt;
&lt;P data-line="217"&gt;&lt;STRONG&gt;Windows agents use cmd.exe, not bash.&lt;/STRONG&gt;&amp;nbsp;If your pipeline runs on self-hosted Windows agents, backslash line continuations and bash-style commands won't work. Use single-line commands or PowerShell syntax, and use Windows-style path separators.&lt;/P&gt;
&lt;P data-line="219"&gt;&lt;STRONG&gt;checkout: self&amp;nbsp;vs named repositories.&lt;/STRONG&gt;&amp;nbsp;When your pipeline YAML lives in the same repo as your training code, always use&amp;nbsp;checkout: self. A named repository checkout pulls the default branch, not the feature branch you're testing — leading to stale code running in your pipeline.&lt;/P&gt;
&lt;P data-line="221"&gt;&lt;STRONG&gt;Start with the training script, not the pipeline.&lt;/STRONG&gt;&amp;nbsp;Get your training script working locally first. The pipeline is just orchestration — if the script doesn't work on your machine, it won't work in the pipeline either.&lt;/P&gt;
&lt;P data-line="223"&gt;&lt;STRONG&gt;Pin your dependencies.&lt;/STRONG&gt; Use a&amp;nbsp;requirements.txt&amp;nbsp;with pinned versions rather than inline&amp;nbsp;pip install&amp;nbsp;with unpinned packages. A scikit-learn minor version bump can change model behavior silently.&lt;/P&gt;
&lt;H2 data-line="225"&gt;Deploying to a Managed Online Endpoint&lt;/H2&gt;
&lt;P data-line="227"&gt;Registering the model in the Azure ML Registry makes it discoverable. But for real-time scoring — where an API, dashboard, or another service sends data and gets predictions back — you need to&amp;nbsp;&lt;STRONG&gt;deploy the model to a Managed Online Endpoint&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-line="229"&gt;Azure ML Managed Online Endpoints handle the infrastructure: provisioning compute, load balancing, scaling, health probes, and rolling deployments. You provide the model and a scoring script.&lt;/P&gt;
&lt;P data-line="229"&gt;HTTP Request (JSON) → Managed Online Endpoint → Deployment (blue) → score.py [init() / run()] + model.pkl → JSON Response (predictions)&lt;/P&gt;
&lt;P data-line="247"&gt;Key concepts:&lt;/P&gt;
&lt;UL data-line="248"&gt;
&lt;LI data-line="234"&gt;An&amp;nbsp;&lt;STRONG&gt;endpoint&lt;/STRONG&gt;&amp;nbsp;is the HTTPS URL that clients call. It has auth (key or AAD token) and a DNS name.&lt;/LI&gt;
&lt;LI data-line="235"&gt;A&amp;nbsp;&lt;STRONG&gt;deployment&lt;/STRONG&gt;&amp;nbsp;sits behind the endpoint and runs your scoring code + model on provisioned compute.&lt;/LI&gt;
&lt;LI data-line="236"&gt;You can have multiple deployments (e.g.,&amp;nbsp;blue&amp;nbsp;and&amp;nbsp;green) behind one endpoint for A/B testing or canary rollouts, controlled by traffic splitting.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-line="252"&gt;The Scoring Script&lt;/H3&gt;
&lt;P data-line="254"&gt;The scoring script is the glue between the endpoint and your pickle bundle. Azure ML calls init() once when the container starts, and run() on every incoming request.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;# score.py — deployed alongside the model
import json
import pickle
import os
import numpy as np
import pandas as pd

def init():
    """Called once when the endpoint container starts."""
    global model_bundle
    model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model_bundle.pkl")
    with open(model_path, "rb") as f:
        model_bundle = pickle.load(f)
    print(f"Model loaded. Trained at: {model_bundle['metadata']['trained_at']}")
    print(f"Expected features: {model_bundle['feature_order']}")

def run(raw_data):
    """Called on every scoring request."""
    try:
        data = json.loads(raw_data)
        df = pd.DataFrame(data["input_data"])

        # Enforce feature order from the bundle
        df = df[model_bundle["feature_order"]]

        # Apply the same scaler used during training
        scaled = model_bundle["scaler"].transform(df)

        # Predict
        predictions = model_bundle["model"].predict(scaled)

        return json.dumps({
            "predictions": predictions.tolist(),
            "model_version": model_bundle["metadata"].get("scikit_learn_version", "unknown"),
        })
    except KeyError as e:
        return json.dumps({"error": f"Missing expected column: {e}"})
    except Exception as e:
        return json.dumps({"error": str(e)})&lt;/LI-CODE&gt;
&lt;P data-line="298"&gt;Key things to notice:&lt;/P&gt;
&lt;UL data-line="300"&gt;
&lt;LI data-line="286"&gt;&lt;STRONG&gt;AZUREML_MODEL_DIR&lt;/STRONG&gt;&amp;nbsp;— Azure ML automatically downloads the model artifact from the registry and sets this environment variable to the local path. You never deal with storage URLs in scoring code.&lt;/LI&gt;
&lt;LI data-line="287"&gt;&lt;STRONG&gt;Feature order enforcement&lt;/STRONG&gt;&amp;nbsp;—&amp;nbsp;df[model_bundle["feature_order"]]&amp;nbsp;ensures columns are in the exact order the model was trained on, even if the caller sends them in a different order.&lt;/LI&gt;
&lt;LI data-line="288"&gt;&lt;STRONG&gt;Same scaler&lt;/STRONG&gt; — The&amp;nbsp;StandardScaler&amp;nbsp;from the bundle is reused, so the numerical scaling matches training exactly. This is why we bundle the scaler with the model.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-line="347"&gt;The Deploy Stage in the Pipeline&lt;/H3&gt;
&lt;P data-line="292"&gt;Add this stage after the Register stage. All endpoint and deployment configuration is done inline via az ml CLI parameters — no separate YAML config files needed:&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;- stage: Deploy
    dependsOn: Register
    displayName: 'Deploy to Managed Endpoint'
    jobs:
      - job: DeployModel
        steps:
          - checkout: self                   # to access score.py

          - task: AzureCLI@2
            displayName: 'Create or update endpoint'
            inputs:
              azureSubscription: '&amp;lt;your-service-connection&amp;gt;'
              scriptType: 'ps'
              scriptLocation: 'inlineScript'
              inlineScript: |
                az extension add -n ml --yes

                # Create endpoint if it doesn't exist (idempotent)
                $exists = az ml online-endpoint show `
                  --name &amp;lt;your-endpoint-name&amp;gt; `
                  --resource-group &amp;lt;your-resource-group&amp;gt; `
                  --workspace-name &amp;lt;your-workspace&amp;gt; 2&amp;gt;$null

                if (-not $exists) {
                  az ml online-endpoint create `
                    --name &amp;lt;your-endpoint-name&amp;gt; `
                    --auth-mode key `
                    --resource-group &amp;lt;your-resource-group&amp;gt; `
                    --workspace-name &amp;lt;your-workspace&amp;gt;
                }

          - task: AzureCLI@2
            displayName: 'Deploy model to endpoint'
            inputs:
              azureSubscription: '&amp;lt;your-service-connection&amp;gt;'
              scriptType: 'ps'
              scriptLocation: 'inlineScript'
              inlineScript: |
                az extension add -n ml --yes

                az ml online-deployment create `
                  --name blue `
                  --endpoint-name &amp;lt;your-endpoint-name&amp;gt; `
                  --model azureml://registries/&amp;lt;your-ml-registry&amp;gt;/models/&amp;lt;your-model-name&amp;gt;/versions/&amp;lt;version-number&amp;gt; `
                  --code-path ./scoring `
                  --scoring-script score.py `
                  --environment-image mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest `
                  --instance-type Standard_DS3_v2 `
                  --instance-count 1 `
                  --resource-group &amp;lt;your-resource-group&amp;gt; `
                  --workspace-name &amp;lt;your-workspace&amp;gt; `
                  --all-traffic

          - task: AzureCLI@2
            displayName: 'Smoke test the endpoint'
            inputs:
              azureSubscription: '&amp;lt;your-service-connection&amp;gt;'
              scriptType: 'ps'
              scriptLocation: 'inlineScript'
              inlineScript: |
                az extension add -n ml --yes

                # Send a test request to verify the deployment is healthy
                az ml online-endpoint invoke `
                  --name &amp;lt;your-endpoint-name&amp;gt; `
                  --resource-group &amp;lt;your-resource-group&amp;gt; `
                  --workspace-name &amp;lt;your-workspace&amp;gt; `
                  --request-file scoring/sample-request.json&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Version pinning is critical.&lt;/STRONG&gt;&amp;nbsp;The scikit-learn version in your scoring environment must match the version used during training. Pickle deserialization can fail or produce wrong results if the versions differ.&lt;/P&gt;
&lt;H3 data-line="424"&gt;Deploy Stage Placeholder Reference&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Placeholder&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;th&gt;Example&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&amp;lt;your-endpoint-name&amp;gt;&lt;/td&gt;&lt;td&gt;Unique endpoint name (DNS-safe)&lt;/td&gt;&lt;td&gt;anomaly-scoring-endpoint&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;lt;your-workspace&amp;gt;&lt;/td&gt;&lt;td&gt;Azure ML Workspace name&lt;/td&gt;&lt;td&gt;ml-workspace-prod&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2 data-line="462"&gt;Complete Pipeline — All Four Stages&lt;/H2&gt;
&lt;P data-line="464"&gt;Here's the full pipeline structure showing how Train, Register, and Deploy connect:&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;stages:
  - stage: DevOps          # Gate
  - stage: Train            # Train model → publish pickle artifact
    dependsOn: DevOps
  - stage: Register         # Register pickle in Azure ML Registry
    dependsOn: Train
  - stage: Deploy           # Deploy to Managed Online Endpoint
    dependsOn: Register
    # Optional: add a manual approval gate here
    # condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))&lt;/LI-CODE&gt;
&lt;P&gt;Each stage is independently retriable. If Deploy fails, you don't retrain or re-register — you just redeploy.&lt;/P&gt;
&lt;H2 data-line="481"&gt;Extending This Template&lt;/H2&gt;
&lt;P data-line="483"&gt;Once the base pipeline is working, consider these additions:&lt;/P&gt;
&lt;UL data-line="485"&gt;
&lt;LI data-line="438"&gt;&lt;STRONG&gt;Model validation stage&lt;/STRONG&gt;&amp;nbsp;— Add a stage between Register and Deploy that runs the model against a holdout set and gates deployment on a minimum performance threshold.&lt;/LI&gt;
&lt;LI data-line="439"&gt;&lt;STRONG&gt;Batch scoring pipeline&lt;/STRONG&gt;&amp;nbsp;— A separate pipeline or Azure Function loads the model from the registry and scores large datasets on a schedule using Azure ML Batch Endpoints.&lt;/LI&gt;
&lt;LI data-line="440"&gt;&lt;STRONG&gt;Monitoring&lt;/STRONG&gt;&amp;nbsp;— Use Azure ML model monitoring to track data drift and prediction distributions over time. Trigger retraining automatically when drift exceeds a threshold.&lt;/LI&gt;
&lt;LI data-line="441"&gt;&lt;STRONG&gt;Multi-environment promotion&lt;/STRONG&gt;&amp;nbsp;— Register to a dev registry first, deploy to a staging endpoint, run integration tests, then promote to production.&lt;/LI&gt;
&lt;LI data-line="442"&gt;&lt;STRONG&gt;A/B testing&lt;/STRONG&gt;&amp;nbsp;— Use traffic splitting to evaluate a new model version against the current one on live traffic before committing.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-line="491"&gt;Conclusion&lt;/H2&gt;
&lt;P data-line="493"&gt;An end-to-end MLOps pipeline doesn't need to be complex. The core pattern is:&lt;/P&gt;
&lt;OL data-line="495"&gt;
&lt;LI data-line="448"&gt;&lt;STRONG&gt;Train&lt;/STRONG&gt;&amp;nbsp;— Run the training script, serialize the model bundle&lt;/LI&gt;
&lt;LI data-line="449"&gt;&lt;STRONG&gt;Register&lt;/STRONG&gt;&amp;nbsp;— Push to Azure ML Registry with automatic versioning&lt;/LI&gt;
&lt;LI data-line="450"&gt;&lt;STRONG&gt;Deploy&lt;/STRONG&gt;&amp;nbsp;— Create/update a Managed Online Endpoint with the new version&lt;/LI&gt;
&lt;LI data-line="451"&gt;&lt;STRONG&gt;Score&lt;/STRONG&gt;&amp;nbsp;— Clients call a standard HTTPS API, the endpoint handles scaling&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="500"&gt;The value comes from making this repeatable and removing manual steps. Every push to&amp;nbsp;main&amp;nbsp;trains a fresh model, registers it, and deploys it to a live endpoint — with a rollback path through blue-green deployments if anything goes wrong.&lt;/P&gt;
&lt;P data-line="502"&gt;Copy this template, replace the &amp;lt;your-...&amp;gt; placeholders, write your training script and scoring script, and you have a production-grade MLOps pipeline. The structure stays the same regardless of whether you're deploying an anomaly detector, a classifier, or a regression model.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 10:45:39 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-an-end-to-end-mlops-pipeline-from-training-to-managed/ba-p/4509852</guid>
      <dc:creator>Gapandey</dc:creator>
      <dc:date>2026-04-09T10:45:39Z</dc:date>
    </item>
    <item>
      <title>Enterprise UAMI Design in Azure: Trust Boundaries and Blast Radius</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/enterprise-uami-design-in-azure-trust-boundaries-and-blast/ba-p/4509614</link>
      <description>&lt;P&gt;As organizations move toward secretless authentication models in Azure, Managed Identity has become the preferred approach for enabling secure communication between services. User Assigned Managed Identity (UAMI) in particular offers flexibility that allows identity reuse across multiple compute resources such as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure App Service&lt;/LI&gt;
&lt;LI&gt;Azure Function Apps&lt;/LI&gt;
&lt;LI&gt;Virtual Machines&lt;/LI&gt;
&lt;LI&gt;Azure Kubernetes Service&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;While this flexibility is beneficial from an operational perspective, it also introduces architectural considerations that are often overlooked during initial implementation.&lt;/P&gt;
&lt;P&gt;In enterprise environments where shared infrastructure patterns are common, the way UAMI is designed and assigned can directly influence the effective trust boundary of the deployment.&amp;nbsp;&lt;/P&gt;
&lt;H5&gt;&lt;STRONG&gt;Understanding Identity Scope in Azure&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;Unlike System Assigned Managed Identity, a UAMI exists independently of the compute resource lifecycle and can be attached to multiple services across:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Resource Groups&lt;/LI&gt;
&lt;LI&gt;Subscriptions&lt;/LI&gt;
&lt;LI&gt;Environments&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This capability allows a single identity to be reused across development, testing, or production services when required.&lt;/P&gt;
&lt;P&gt;However, identity reuse across multiple logical environments can expand the operational trust boundary of that identity. Any permission granted to the identity is implicitly inherited by all services to which the identity is attached.&lt;/P&gt;
&lt;P&gt;From an architectural standpoint, this creates a shared authentication surface across isolated deployment environments.&lt;/P&gt;
&lt;H5&gt;&lt;STRONG&gt;High-Level Architecture: Shared Identity Pattern&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;In many enterprise Azure deployments, it is common to observe patterns where:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A single UAMI is assigned to multiple App Services&lt;/LI&gt;
&lt;LI&gt;The same identity is reused across automation workloads&lt;/LI&gt;
&lt;LI&gt;Identities are provisioned centrally and attached dynamically&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;While this simplifies management and avoids identity sprawl, it may also introduce unintended privilege propagation across services.&lt;/P&gt;
&lt;P&gt;For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;In this architecture:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Multiple App Services across environments share the same managed identity.&lt;/LI&gt;
&lt;LI&gt;Each compute instance requests an access token from Microsoft Entra ID using Azure Instance Metadata Service (IMDS).&lt;/LI&gt;
&lt;LI&gt;The issued token is then used to authenticate against downstream platform services such as:
&lt;UL&gt;
&lt;LI&gt;Azure SQL Database&lt;/LI&gt;
&lt;LI&gt;Azure Key Vault&lt;/LI&gt;
&lt;LI&gt;Azure Storage&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Because RBAC permissions are assigned to the shared identity rather than the compute instance itself, the effective authentication boundary becomes identity‑scoped instead of environment‑scoped.&lt;/P&gt;
&lt;P&gt;As a result, any compromised lower‑tier environment such as DEV may obtain an access token capable of accessing production‑level resources if those permissions are assigned to the shared identity.&lt;/P&gt;
&lt;P&gt;This expands the operational trust boundary across environments and increases the potential blast radius in the event of identity misuse.&lt;/P&gt;
&lt;H5&gt;&lt;STRONG&gt;Blast Radius Considerations&amp;nbsp;&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;Blast radius refers to the potential impact scope of a security or configuration compromise.&lt;/P&gt;
&lt;P&gt;When a shared UAMI is used across multiple services, the following conditions may increase the blast radius:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Design Pattern&lt;/th&gt;&lt;th&gt;Potential Risk&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Single UAMI across environments&lt;/td&gt;&lt;td&gt;Cross‑environment access&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Subscription‑wide RBAC assignment&lt;/td&gt;&lt;td&gt;Broad privilege scope&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Identity used for automation pipelines&lt;/td&gt;&lt;td&gt;Lateral movement&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Shared identity across teams&lt;/td&gt;&lt;td&gt;Ownership ambiguity&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Because Managed Identity authentication relies on Azure Instance Metadata Service (IMDS), any compromised compute resource with access to IMDS may request an access token using the attached identity.&lt;/P&gt;
&lt;P&gt;This token can then be used to authenticate with downstream Azure services for which the identity has RBAC permissions.&lt;/P&gt;
&lt;H5&gt;&lt;STRONG&gt;Enterprise Design Recommendations:&lt;/STRONG&gt;&lt;STRONG style="color: rgb(30, 30, 30);"&gt;&amp;nbsp;Environment‑Isolated Identity Model&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;To reduce identity blast radius in enterprise deployments, the following architectural principles may be considered:&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Environment‑Scoped Identity&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Provision separate UAMIs per environment:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;UAMI‑DEV&lt;/LI&gt;
&lt;LI&gt;UAMI‑UAT&lt;/LI&gt;
&lt;LI&gt;UAMI‑PROD&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Avoid reusing the same identity across isolated lifecycle stages.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Resource‑Level RBAC Assignment&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Prefer assigning RBAC permissions at:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Resource&lt;/LI&gt;
&lt;LI&gt;Resource Group&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;instead of Subscription scope wherever feasible.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Identity Ownership Model&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Ensure ownership clarity for identities assigned across shared workloads. Identity lifecycle should be aligned with:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Application ownership&lt;/LI&gt;
&lt;LI&gt;Service ownership&lt;/LI&gt;
&lt;LI&gt;Deployment boundary&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6&gt;&lt;STRONG&gt;Least Privilege Assignment&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Assign roles such as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Key Vault Secrets User&lt;/LI&gt;
&lt;LI&gt;Storage Blob Data Reader&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;instead of broader roles such as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Contributor&lt;/LI&gt;
&lt;LI&gt;Owner&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;&lt;STRONG&gt;Recommended High‑Level Architecture&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;In this architecture:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Each App Service instance is attached to an environment‑specific managed identity.&lt;/LI&gt;
&lt;LI&gt;RBAC assignments are scoped at the resource or resource group level.&lt;/LI&gt;
&lt;LI&gt;Microsoft Entra ID issues tokens independently for each identity.&lt;/LI&gt;
&lt;LI&gt;Trust boundaries remain aligned with deployment environments.&lt;/LI&gt;
&lt;LI&gt;A compromised DEV compute instance can only obtain a token associated with UAMI‑DEV.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Because UAMI‑DEV does not have RBAC permissions for production resources, lateral access to PROD dependencies is prevented.&lt;/P&gt;
&lt;H5&gt;&lt;STRONG&gt;Blast Radius Containment:&amp;nbsp;&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;This design significantly reduces the potential blast radius by ensuring that:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Identity compromise remains environment‑scoped.&lt;/LI&gt;
&lt;LI&gt;Token issuance does not grant unintended cross‑environment privileges.&lt;/LI&gt;
&lt;LI&gt;RBAC permissions align with application ownership boundaries.&lt;/LI&gt;
&lt;LI&gt;Authentication trust boundaries match deployment lifecycle boundaries.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;&lt;STRONG&gt;Conclusion&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;User Assigned Managed Identity offers significant advantages for secretless authentication in Azure environments. However, architectural considerations related to identity reuse and scope of assignment must be evaluated carefully in enterprise deployments.&lt;/P&gt;
&lt;P&gt;By aligning identity design with trust boundaries and minimizing the blast radius through scoped RBAC and environment isolation, organizations can implement Managed Identity in a way that balances operational efficiency with security governance.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 08:13:33 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/enterprise-uami-design-in-azure-trust-boundaries-and-blast/ba-p/4509614</guid>
      <dc:creator>AmitManchanda28</dc:creator>
      <dc:date>2026-04-09T08:13:33Z</dc:date>
    </item>
    <item>
      <title>Enabling AI-Driven Enterprise Intelligence Using SAP and Microsoft 3-IQ Layers</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/enabling-ai-driven-enterprise-intelligence-using-sap-and/ba-p/4509721</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Architectural Context&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Enterprise SAP platforms such as SAP ECC, SAP S/4HANA, and SAP BW continue to function as authoritative transactional systems supporting financial accounting, treasury management, portfolio reporting, and regulatory compliance workflows. These environments are optimized for consistency in transactional processing and deterministic reporting. However, they are not designed to support real‑time inferencing workloads or cross‑domain contextual reasoning required for enterprise‑scale AI systems.&lt;/P&gt;
&lt;P&gt;In most enterprise architectures, SAP operational data remains logically separated from analytical platforms and collaboration ecosystems such as Microsoft 365. This separation results in fragmentation across three intelligence domains:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Transactional business data&lt;/LI&gt;
&lt;LI&gt;Analytical semantic models&lt;/LI&gt;
&lt;LI&gt;Organizational workflow signals&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;AI workloads deployed against isolated analytical environments therefore lack direct access to governed ERP data, enterprise policy frameworks, and user workflow context. This limits the ability of AI systems to generate role‑aware, policy‑aligned recommendations within operational decision processes.&lt;/P&gt;
&lt;P&gt;The integration of SAP Business Data Cloud with Microsoft Fabric introduces a unified data access model in which SAP business data products can be exposed directly into Microsoft Fabric’s OneLake environment through bi‑directional, zero‑copy sharing. This approach enables SAP data to be consumed by analytics and AI workloads without physical replication while preserving SAP‑defined semantics, lineage, and access controls. &lt;A href="https://news.sap.com/2025/11/sap-bdc-connect-for-microsoft-fabric-business-insights-ai-innovation/" target="_blank"&gt;[news.sap.com]&lt;/A&gt;, &lt;A href="https://windowsforum.com/threads/sap-bdc-connect-for-microsoft-fabric-zero-copy-bi-directional-data-for-ai.390599/" target="_blank"&gt;[windowsforum.com]&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;SAP Data Integration with Microsoft Fabric&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Microsoft Fabric provides a SaaS‑based unified analytics platform built on OneLake, consolidating data engineering, warehousing, analytics, and AI workloads within a single environment.&lt;/P&gt;
&lt;P&gt;SAP Business Data Cloud Connect integrates SAP datasets directly into OneLake without requiring traditional ETL‑driven staging layers. SAP data products are surfaced within Fabric in their native semantic form, allowing Fabric services to query operational ERP datasets in place while maintaining governance boundaries defined within SAP environments. &lt;A href="https://news.sap.com/2025/11/sap-bdc-connect-for-microsoft-fabric-business-insights-ai-innovation/" target="_blank"&gt;[news.sap.com]&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;This architecture eliminates batch‑oriented data extraction pipelines and reduces latency associated with data synchronization between transactional and analytical platforms.&lt;/P&gt;
&lt;P&gt;The integration model supports bidirectional data exchange. Analytical outputs generated within Fabric, such as aggregated financial metrics or predictive forecasts, can be made available to SAP systems to support downstream operational processes. This establishes a closed‑loop architecture in which transactional and analytical workloads continuously inform each other without requiring redundant data copies. &lt;A href="https://windowsforum.com/threads/sap-bdc-connect-for-microsoft-fabric-zero-copy-bi-directional-data-for-ai.390599/" target="_blank"&gt;[windowsforum.com]&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Semantic Modeling through Fabric Intelligence Layer&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Operational ERP datasets are not directly consumable by AI inferencing systems due to their structural complexity and absence of domain‑aligned semantics.&lt;/P&gt;
&lt;P&gt;Fabric introduces a semantic modeling layer that standardizes structured enterprise datasets into business‑aligned entities, relationships, and domain metrics. This layer maps SAP transactional data into enterprise constructs such as financial exposure, liquidity position, or compliance thresholds.&lt;/P&gt;
&lt;P&gt;By propagating standardized semantic definitions across analytical tools and AI workloads, the semantic layer ensures that all downstream consumers interpret ERP‑originated data consistently. This mitigates semantic divergence across departments and establishes a unified enterprise data model capable of supporting inferencing and automation.&lt;/P&gt;
&lt;P&gt;Within financial services environments, this enables modeling of constructs such as portfolio risk or regulatory exposure in a form that AI workloads can process without requiring interpretation of underlying transactional tables.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Knowledge Grounding through Foundry Intelligence Layer&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;AI systems operating in regulated enterprise environments must operate within defined governance and audit frameworks.&lt;/P&gt;
&lt;P&gt;Foundry introduces a controlled knowledge access layer that connects AI workloads to enterprise knowledge repositories, including:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;SAP process logic&lt;/LI&gt;
&lt;LI&gt;Financial reporting procedures&lt;/LI&gt;
&lt;LI&gt;Internal governance policies&lt;/LI&gt;
&lt;LI&gt;Regulatory documentation&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Access to these knowledge sources is governed by identity‑driven access control and policy enforcement mechanisms, ensuring that AI outputs are grounded in approved enterprise content.&lt;/P&gt;
&lt;P&gt;This knowledge grounding layer enables AI workloads to retrieve contextual policy information relevant to operational decision scenarios while maintaining traceability between AI‑generated outputs and source documentation.&lt;/P&gt;
&lt;P&gt;From an architectural perspective, Foundry functions as the knowledge retrieval control plane across distributed enterprise data environments.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Contextual Intelligence through Work Intelligence Layer&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Enterprise decision processes require contextual awareness of organizational roles and workflow dependencies.&lt;/P&gt;
&lt;P&gt;The Work Intelligence layer derives contextual signals from Microsoft 365 collaboration environments, including communication patterns, document interactions, and meeting engagement data.&lt;/P&gt;
&lt;P&gt;These signals are used to model organizational workflows and operational dependencies across business units.&lt;/P&gt;
&lt;P&gt;This contextual layer enables AI workloads to tailor analytical outputs based on user role and decision responsibility. For example, identical financial datasets may produce different recommendations for a portfolio manager, risk analyst, or compliance officer depending on the operational context.&lt;/P&gt;
&lt;P&gt;Work Intelligence therefore, introduces workflow‑specific contextualization into enterprise AI workloads.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;End‑to‑End Architectural Flow&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The architecture follows a layered intelligence model in which each component contributes a discrete capability:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This architecture avoids data duplication, preserves governance boundaries, and supports scalable AI adoption across enterprise financial environments.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Financial Services Application Scenario&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Within financial services organizations, SAP environments manage general ledger processing, asset accounting, and risk calculations.&lt;/P&gt;
&lt;P&gt;Fabric consumes operational ERP datasets and applies semantic modeling to define enterprise financial indicators.&lt;/P&gt;
&lt;P&gt;AI workloads leverage structured data and governed knowledge sources to generate insights such as liquidity forecasts or compliance evaluations.&lt;/P&gt;
&lt;P&gt;The Work Intelligence layer ensures that these outputs are delivered within the operational context of specific roles and workflows.&lt;/P&gt;
&lt;P&gt;This enables automated reporting and decision support without disruption to existing SAP transactional environments.&lt;/P&gt;
&lt;P&gt;The integration of SAP Business Data Cloud with Microsoft Fabric, in conjunction with the Work IQ, Fabric IQ, and Foundry IQ intelligence layers, establishes a scalable architectural framework that enables organizations to evolve from traditional ERP‑centric reporting toward AI‑enabled enterprise intelligence. By facilitating governed access to SAP data, enabling semantic alignment of business models, supporting policy‑driven knowledge retrieval, and incorporating contextual operational insights, this architecture allows enterprises to operationalize AI‑driven financial decision‑making within regulated environments while maintaining data integrity, governance, and compliance.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Apr 2026 19:24:25 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/enabling-ai-driven-enterprise-intelligence-using-sap-and/ba-p/4509721</guid>
      <dc:creator>srhulsus</dc:creator>
      <dc:date>2026-04-08T19:24:25Z</dc:date>
    </item>
    <item>
      <title>Build Your AI Agent in 5 Minutes with AI Toolkit for VS Code</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/build-your-ai-agent-in-5-minutes-with-ai-toolkit-for-vs-code/ba-p/4509578</link>
      <description>&lt;P&gt;&lt;STRONG&gt;What if building an AI agent was as easy as filling out a form?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;No frameworks to install. No boilerplate to copy-paste from GitHub. No YAML to debug at midnight. Just VS Code, one extension, and an idea.&lt;/P&gt;
&lt;P&gt;AI Toolkit for VS Code turns agent development into something anyone can do — whether you're a seasoned developer who wants full code control, or someone who's never touched an AI framework and just wants to see something work.&lt;/P&gt;
&lt;P&gt;Let's build an agent. Then let's explore what else this toolkit can do.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Getting Set Up&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;You need two things:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;&lt;A href="vscode-file://vscode-app/c:/Users/ishasahni/AppData/Local/Programs/Microsoft%20VS%20Code/e7fb5e96c0/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank"&gt;VS Code&lt;/A&gt;&lt;/STRONG&gt;&amp;nbsp;— download and install if you haven't already&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;&lt;A href="vscode-file://vscode-app/c:/Users/ishasahni/AppData/Local/Programs/Microsoft%20VS%20Code/e7fb5e96c0/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank"&gt;AI Toolkit extension&lt;/A&gt;&lt;/STRONG&gt;&amp;nbsp;— open VS Code, go to Extensions (Ctrl+Shift+X), search "AI Toolkit", and install it&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;That's it. No terminal commands. No dependencies to wrangle. When AI Toolkit installs, it brings everything it needs — including the Microsoft Foundry integration and GitHub Copilot skills for agent development.&lt;/P&gt;
&lt;P&gt;Once installed, you'll see a new&amp;nbsp;&lt;STRONG&gt;AI Toolkit icon&lt;/STRONG&gt;&amp;nbsp;in the left sidebar. Click it. That's your home base for everything we're about to do.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Build an Agent — No Code Required&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Open the Command Palette (Ctrl+Shift+P) and type&amp;nbsp;&lt;STRONG&gt;"Create Agent"&lt;/STRONG&gt;. You'll see a clean panel with two options side by side:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Design an Agent Without Code&lt;/STRONG&gt;&amp;nbsp;— visual builder, perfect for getting started&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Create in Code&lt;/STRONG&gt;&amp;nbsp;— full project scaffolding, for when you want complete control&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Click&amp;nbsp;&lt;STRONG&gt;"Design an Agent Without Code."&lt;/STRONG&gt;&amp;nbsp;Agent Builder opens up.&lt;/P&gt;
&lt;P&gt;Now fill in three things:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt; &lt;/STRONG&gt;&lt;STRONG&gt;Give it a name&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Something descriptive. For this example: "Azure Advisor"&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;&lt;STRONG&gt; Pick a model&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Click the model dropdown. You'll see a list of available models — GPT-4.1, Claude Opus 4.6, and others. Foundry models appear at the top as recommended options. Pick one.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's a nice detail:&amp;nbsp;&lt;STRONG&gt;you don't need to know&lt;/STRONG&gt;&amp;nbsp;whether your model uses the Chat Completions API or the Responses API. AI Toolkit detects this automatically and handles the switch behind the scenes.&lt;/P&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;&lt;STRONG&gt; Write your instructions&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;This is where you tell the agent&amp;nbsp;&lt;EM&gt;who it is&lt;/EM&gt;&amp;nbsp;and&amp;nbsp;&lt;EM&gt;how to behave&lt;/EM&gt;. Think of it as a personality brief:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Hit Run&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;That's it. Click&amp;nbsp;&lt;STRONG&gt;Run&lt;/STRONG&gt;&amp;nbsp;and start chatting with your agent in the built-in playground.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Want More Control? Build in Code&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The no-code path is great for prototyping and prompt engineering. But when you need custom tools, business logic, or multi-agent workflows — switch to code.&lt;/P&gt;
&lt;P&gt;From the Create Agent View, choose&amp;nbsp;&lt;STRONG&gt;"Create in Code with Full Control."&lt;/STRONG&gt;&amp;nbsp;You get two options:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Scaffold from a template&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Pick a pre-built project structure — single agent, multi-agent, or LangGraph workflow. AI Toolkit generates a complete project with proper folder structure, configuration files, and starter code. Open it, customize it, run it.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Generate with GitHub Copilot&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Describe your agent in plain English in Copilot Chat:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;"Create a customer support agent that can look up order status, process returns, and escalate to a human when the customer is upset."&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Copilot generates a full project — agent logic, tool definitions, system prompts, and evaluation tests. It uses the&amp;nbsp;&lt;STRONG&gt;microsoft-foundry&amp;nbsp;skill&lt;/STRONG&gt;, the same open-source skill powering GitHub Copilot for Azure. AI Toolkit installs and keeps this skill updated automatically — you never configure it.&lt;/P&gt;
&lt;P&gt;The output is structured and production-ready. Real folder structure. Real separation of concerns. Not a single-file script.&lt;/P&gt;
&lt;P&gt;Either way, you get a project you can version-control, test, and deploy.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Cool Features You Should Know About&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Building the agent is just the beginning. Here's where AI Toolkit gets genuinely impressive.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;🔧 Add Real Tools with MCP&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Your agent can do more than just talk. Click&amp;nbsp;&lt;STRONG&gt;Add Tool&lt;/STRONG&gt;&amp;nbsp;in Agent Builder to connect&amp;nbsp;&lt;STRONG&gt;MCP (Model Context Protocol) servers&lt;/STRONG&gt;&amp;nbsp;— these give your agent real capabilities:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Search the web&lt;/LI&gt;
&lt;LI&gt;Query a database&lt;/LI&gt;
&lt;LI&gt;Read files&lt;/LI&gt;
&lt;LI&gt;Call external APIs&lt;/LI&gt;
&lt;LI&gt;Interact with any service that has an MCP server&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;You control how much freedom your agent gets. Set tool approval to&amp;nbsp;&lt;STRONG&gt;Auto&lt;/STRONG&gt;&amp;nbsp;(tool runs immediately) or&amp;nbsp;&lt;STRONG&gt;Manual&lt;/STRONG&gt;&amp;nbsp;(you approve each call). Perfect for when you trust a read-only search tool but want oversight on anything that takes action.&lt;/P&gt;
&lt;P&gt;You can also&amp;nbsp;&lt;STRONG&gt;delete MCP servers&lt;/STRONG&gt;&amp;nbsp;directly from the Tool Catalog when you no longer need them — no config file editing required.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;🧠 Prompt Optimizer&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Not sure if your instructions are good enough? Click the&amp;nbsp;&lt;STRONG&gt;Improve&lt;/STRONG&gt;&amp;nbsp;button in Agent Builder. The Foundry Prompt Optimizer analyzes your prompt and rewrites it to be clearer, more structured, and more effective.&lt;/P&gt;
&lt;P&gt;It's like having a prompt engineering expert review your work — except it takes seconds.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;🕸️ Agent Inspector&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;When your agent runs, open&amp;nbsp;&lt;STRONG&gt;Agent Inspector&lt;/STRONG&gt;&amp;nbsp;to see what's happening under the hood. It visualizes the entire workflow in real time — which tools are called, in what order, and how the agent makes decisions.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;💬 Conversations View&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Agent Builder includes a&amp;nbsp;&lt;STRONG&gt;Conversations tab&lt;/STRONG&gt;&amp;nbsp;where you can review the full history of interactions with your agent. Scroll through past conversations, compare how your agent handled different scenarios, and spot patterns in where it succeeds or struggles.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;📁 Everything in One Sidebar&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;AI Toolkit puts everything in a single&amp;nbsp;&lt;STRONG&gt;My Resources&lt;/STRONG&gt;&amp;nbsp;panel:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Recent Agents&lt;/STRONG&gt;&amp;nbsp;— one-click access to agents you've been working on&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Local Resources&lt;/STRONG&gt;&amp;nbsp;— your local models, agents, and tools&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Foundry Resources&lt;/STRONG&gt;&amp;nbsp;— remote agents and models (if connected)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why AI Toolkit?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;There are other ways to build agents. What makes this different?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Everything is in VS Code.&lt;/STRONG&gt;&amp;nbsp;You don't context-switch between a web UI, a CLI, and an IDE. Discovery, building, testing, debugging, and deployment all happen in one place.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;No-code and code-first aren't separate products.&lt;/STRONG&gt;&amp;nbsp;They're two views of the same agent. Start in Agent Builder, click View Code, and you have a full project. Or go the other way — build in code and test in the visual playground.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Copilot is deeply integrated.&lt;/STRONG&gt;&amp;nbsp;Not as a chatbot bolted on the side — as an actual development tool that understands agent architecture and generates production-quality scaffolding.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Wrapping Up:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;📥&amp;nbsp;&lt;STRONG&gt;Install:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="vscode-file://vscode-app/c:/Users/ishasahni/AppData/Local/Programs/Microsoft%20VS%20Code/e7fb5e96c0/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank"&gt;AI Toolkit on the VS Code Marketplace&lt;/A&gt;&lt;BR /&gt;📖&amp;nbsp;&lt;STRONG&gt;Learn:&lt;/STRONG&gt;&amp;nbsp;&lt;A href="vscode-file://vscode-app/c:/Users/ishasahni/AppData/Local/Programs/Microsoft%20VS%20Code/e7fb5e96c0/resources/app/out/vs/code/electron-browser/workbench/workbench.html" target="_blank"&gt;AI Toolkit Documentation&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Open VS Code.&amp;nbsp;Ctrl+Shift+P. Type "Create Agent."&lt;/P&gt;
&lt;P&gt;Five minutes from now, you'll have an agent running. 🚀&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Apr 2026 09:59:38 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/build-your-ai-agent-in-5-minutes-with-ai-toolkit-for-vs-code/ba-p/4509578</guid>
      <dc:creator>isha_sahni</dc:creator>
      <dc:date>2026-04-08T09:59:38Z</dc:date>
    </item>
    <item>
      <title>Private DNS and Hub–Spoke Networking for Enterprise AI Workloads on Azure</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/private-dns-and-hub-spoke-networking-for-enterprise-ai-workloads/ba-p/4508835</link>
      <description>&lt;H2&gt;&lt;U&gt;&lt;STRONG&gt;Introduction&lt;/STRONG&gt;&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;As organizations deploy enterprise AI platforms on Azure, &lt;SPAN style="color: rgb(30, 30, 30);"&gt;security requirements increasingly drive the adoption of private-first architectures.&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Private networking only&lt;/LI&gt;
&lt;LI&gt;Centralized firewalls or NVAs&lt;/LI&gt;
&lt;LI&gt;Hub–and–spoke virtual network architectures&lt;/LI&gt;
&lt;LI&gt;Private Endpoints for all PaaS services&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;While these patterns are well understood individually, &lt;STRONG&gt;their interaction often exposes hidden failure modes&lt;/STRONG&gt;, particularly around &lt;STRONG&gt;DNS and name resolution&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;During a recent production deployment of a &lt;STRONG&gt;private, enterprise-grade AI workload on Azure&lt;/STRONG&gt;, several issues surfaced that initially appeared to be platform or service instability. Closer analysis revealed the real cause: &lt;STRONG&gt;gaps in network and DNS design&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;This post shares a &lt;STRONG&gt;real-world technical walkthrough&lt;/STRONG&gt; of the problem, root causes, resolution steps, and key lessons that now form a &lt;STRONG&gt;reusable blueprint&lt;/STRONG&gt; for running AI workloads reliably in private Azure environments.&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;&lt;STRONG&gt;Problem Statement&lt;/STRONG&gt;&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;The platform was deployed with the following characteristics:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Hub and spoke network topology&lt;/LI&gt;
&lt;LI&gt;Custom DNS servers running in the hub&lt;/LI&gt;
&lt;LI&gt;Firewall / NVA enforcing strict egress controls&lt;/LI&gt;
&lt;LI&gt;AI, data, and platform services exposed through Private Endpoints&lt;/LI&gt;
&lt;LI&gt;Azure Container Apps using internal load balancer mode&lt;/LI&gt;
&lt;LI&gt;Centralized monitoring, secrets, and identity services&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Despite successful infrastructure deployment, the environment exhibited &lt;STRONG&gt;non-deterministic production issues&lt;/STRONG&gt;, including:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Container Apps intermittently failing to start or scale&lt;/LI&gt;
&lt;LI&gt;AI platform endpoints becoming unreachable from workload subnets&lt;/LI&gt;
&lt;LI&gt;Authentication and secret access failures&lt;/LI&gt;
&lt;LI&gt;DNS resolution working in some environments but failing in others&lt;/LI&gt;
&lt;LI&gt;Terraform deployments stalling or failing unexpectedly&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Because the symptoms varied across subnets and environments, &lt;STRONG&gt;root cause identification was initially non-trivial&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;&lt;STRONG&gt;Root Cause Analysis&lt;/STRONG&gt;&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;After end-to-end isolation, the issue was not AI services, authentication, or application logic. The core problem was &lt;STRONG&gt;DNS resolution in a private Azure environment&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;1. Custom DNS servers were not Azure-aware&lt;/H3&gt;
&lt;P&gt;The hub DNS servers correctly resolved:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Corporate domains&lt;/LI&gt;
&lt;LI&gt;On‑premises records&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;However, they could &lt;STRONG&gt;not resolve Azure platform names or Private Endpoint FQDNs&lt;/STRONG&gt; by default.&lt;/P&gt;
&lt;P&gt;Azure relies on an internal recursive resolver (168.63.129.16) that &lt;STRONG&gt;must be explicitly integrated&lt;/STRONG&gt; when using custom DNS.&lt;/P&gt;
&lt;H3&gt;2. Missing conditional forwarders for private DNS zones&lt;/H3&gt;
&lt;P&gt;Many Azure services depend on service-specific private DNS zones, such as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;privatelink.cognitiveservices.azure.com&lt;/LI&gt;
&lt;LI&gt;privatelink.openai.azure.com&lt;/LI&gt;
&lt;LI&gt;privatelink.vaultcore.azure.net&lt;/LI&gt;
&lt;LI&gt;privatelink.search.windows.net&lt;/LI&gt;
&lt;LI&gt;privatelink.blob.core.windows.net&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Without conditional forwarders pointing to Azure’s internal DNS, queries either:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Failed silently, or&lt;/LI&gt;
&lt;LI&gt;Resolved to public endpoints that were blocked by firewall rules&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;3. Container Apps internal DNS requirements were overlooked&lt;/H3&gt;
&lt;P&gt;When Azure Container Apps are deployed with:&lt;/P&gt;
&lt;P&gt;internal_load_balancer_enabled = true&lt;/P&gt;
&lt;P&gt;Azure &lt;STRONG&gt;does not automatically create supporting DNS records&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;The environment generates:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A default domain&lt;/LI&gt;
&lt;LI&gt;.internal subdomains for internal FQDNs&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Without explicitly creating:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A private DNS zone matching the default domain&lt;/LI&gt;
&lt;LI&gt;*, @, and *.internal wildcard records&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;internal service-to-service communication fails&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;4. Private DNS zones were not consistently linked&lt;/H3&gt;
&lt;P&gt;Even when DNS zones existed, they were:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Spread across multiple subscriptions&lt;/LI&gt;
&lt;LI&gt;Linked to some VNets but not others&lt;/LI&gt;
&lt;LI&gt;Missing links to DNS server VNets or shared services VNets&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;As a result, name resolution succeeded in one subnet and failed in another, depending on the lookup path.&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;&lt;STRONG&gt;Resolution&lt;/STRONG&gt;&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;No application changes were required. Stability was achieved entirely through &lt;STRONG&gt;architectural corrections&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;✅ Step 1: Make custom DNS Azure-aware&lt;/H3&gt;
&lt;P&gt;On all custom DNS servers (or NVAs acting as DNS proxies):&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Configure conditional forwarders for all Azure private DNS zones&lt;/LI&gt;
&lt;LI&gt;Forward those queries to: 168.63.129.16&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This IP is Azure’s internal recursive resolver and is &lt;STRONG&gt;mandatory for Private Endpoint resolution&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;✅ Step 2: Centralize and link private DNS zones&lt;/H3&gt;
&lt;P&gt;A centralized private DNS model was adopted:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;All private DNS zones hosted in a shared subscription&lt;/LI&gt;
&lt;LI&gt;Linked to:
&lt;UL&gt;
&lt;LI&gt;Hub VNet&lt;/LI&gt;
&lt;LI&gt;All spoke VNets&lt;/LI&gt;
&lt;LI&gt;DNS server VNet&lt;/LI&gt;
&lt;LI&gt;Any operational or virtual desktop VNets&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This ensured &lt;STRONG&gt;consistent resolution regardless of workload location&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;✅ Step 3: Explicitly handle Container Apps DNS&lt;/H3&gt;
&lt;P&gt;For Container Apps using internal ingress:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Create a private DNS zone matching the environment’s default domain&lt;/LI&gt;
&lt;LI&gt;Add:
&lt;UL&gt;
&lt;LI&gt;* wildcard record&lt;/LI&gt;
&lt;LI&gt;@ apex record&lt;/LI&gt;
&lt;LI&gt;*.internal wildcard record&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Point all records to the Container Apps Environment static IP&lt;/LI&gt;
&lt;LI&gt;Add a conditional forwarder for the default domain if using custom DNS&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This step alone resolved multiple internal connectivity issues.&lt;/P&gt;
&lt;H3&gt;✅ Step 4: Align routing, NSGs, and service tags&lt;/H3&gt;
&lt;P&gt;Firewall, NSG, and route table rules were aligned to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Allow DNS traffic (TCP/UDP 53)&lt;/LI&gt;
&lt;LI&gt;Allow Azure service tags such as:
&lt;UL&gt;
&lt;LI&gt;AzureCloud&lt;/LI&gt;
&lt;LI&gt;CognitiveServices&lt;/LI&gt;
&lt;LI&gt;AzureActiveDirectory&lt;/LI&gt;
&lt;LI&gt;Storage&lt;/LI&gt;
&lt;LI&gt;AzureMonitor&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Ensure certain subnets (e.g., Container Apps, Application Gateway) retained &lt;STRONG&gt;direct internet access where required&lt;/STRONG&gt; by Azure platform services&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;&lt;U&gt;&lt;STRONG&gt;Key Learnings&lt;/STRONG&gt;&lt;/U&gt;&lt;/H2&gt;
&lt;H3&gt;1. DNS is a Tier‑0 dependency for AI platforms&lt;/H3&gt;
&lt;P&gt;Many AI “service issues” are DNS failures in disguise. DNS must be treated as foundational platform infrastructure.&lt;/P&gt;
&lt;H3&gt;2. Private Endpoints require Azure DNS integration&lt;/H3&gt;
&lt;P&gt;If you use:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Custom DNS ✅&lt;/LI&gt;
&lt;LI&gt;Private Endpoints ✅&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Then forwarding to 168.63.129.16 is &lt;STRONG&gt;non‑negotiable&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;3. Container Apps internal ingress has hidden DNS requirements&lt;/H3&gt;
&lt;P&gt;Internal Container Apps environments will not function correctly without manually created DNS zones and .internal records.&lt;/P&gt;
&lt;H3&gt;4. Centralized DNS prevents environment drift&lt;/H3&gt;
&lt;P&gt;Decentralized or subscription-local DNS zones lead to fragile, inconsistent environments. Centralization improves reliability and operability.&lt;/P&gt;
&lt;H3&gt;5. Validate networking first, then the platform&lt;/H3&gt;
&lt;P&gt;Before escalating issues to service teams:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Validate DNS resolution&lt;/LI&gt;
&lt;LI&gt;Verify routing&lt;/LI&gt;
&lt;LI&gt;Check Private Endpoint connectivity&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;In many cases, the perceived “platform issue” disappears.&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;&lt;STRONG&gt;Quick Production Validation Checklist&lt;/STRONG&gt;&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;Before go-live, always validate:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;✅ Private FQDNs resolve to private IPs from all required VNets&lt;/LI&gt;
&lt;LI&gt;✅ UDR/NSG rules allow required Azure service traffic&lt;/LI&gt;
&lt;LI&gt;✅ Managed identities can access all dependent resources&lt;/LI&gt;
&lt;LI&gt;✅ AI portal user workflows succeed (evaluations, agents, etc.)&lt;/LI&gt;
&lt;LI&gt;✅ terraform plan shows &lt;EM&gt;only&lt;/EM&gt; intended changes&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2&gt;&lt;U&gt;&lt;STRONG&gt;Conclusion&lt;/STRONG&gt;&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;Running private, enterprise-grade AI workloads on Azure is absolutely achievable—but it requires &lt;STRONG&gt;intentional DNS and networking design&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;By:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Making custom DNS Azure-aware&lt;/LI&gt;
&lt;LI&gt;Centralizing private DNS zones&lt;/LI&gt;
&lt;LI&gt;Explicitly handling Container Apps DNS&lt;/LI&gt;
&lt;LI&gt;Aligning routing and firewall rules&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;an unstable environment was transformed into a &lt;STRONG&gt;repeatable, production-ready platform pattern&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;If you are building AI solutions on Azure with Private Endpoints and hub–spoke networking, &lt;STRONG&gt;getting DNS right early will save weeks of troubleshooting later&lt;/STRONG&gt;.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2026 07:56:39 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/private-dns-and-hub-spoke-networking-for-enterprise-ai-workloads/ba-p/4508835</guid>
      <dc:creator>deepthihr</dc:creator>
      <dc:date>2026-04-06T07:56:39Z</dc:date>
    </item>
    <item>
      <title>Building Cost-Aware Azure Infrastructure Pipelines: Estimate Costs Before You Deploy</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-cost-aware-azure-infrastructure-pipelines-estimate/ba-p/4508776</link>
      <description>&lt;H2 data-line="16"&gt;The Problem: Cost Is a Blind Spot in IaC Reviews&lt;/H2&gt;
&lt;P data-line="18"&gt;Code reviews for Bicep or Terraform templates typically focus on correctness, security, and compliance. But cost is rarely part of the review process because:&lt;/P&gt;
&lt;UL data-line="20"&gt;
&lt;LI data-line="20"&gt;Developers don't have easy access to pricing data at review time&lt;/LI&gt;
&lt;LI data-line="21"&gt;Azure pricing depends on region, tier, reservation status, and more&lt;/LI&gt;
&lt;LI data-line="22"&gt;There's no built-in "cost diff" in any IaC tool&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="24"&gt;This means cost regressions slip through the same way bugs do when there are no tests.&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;iac-review-gap&lt;/P&gt;
&lt;/img&gt;
&lt;H2 data-line="30"&gt;Architecture Overview&lt;/H2&gt;
&lt;P&gt;Here's the pipeline we'll build:&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;architecture-overview&lt;/P&gt;
&lt;/img&gt;
&lt;H3&gt;Step 1: Use Bicep What-If to Detect Changes&lt;/H3&gt;
&lt;P data-line="40"&gt;Azure's what-if deployment mode shows you exactly what resources will be created, modified, or deleted — without actually deploying anything.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;az deployment group what-if --resource-group rg-myapp-prod --template-file main.bicep --parameters main.bicepparam --result-format ResourceIdOnly --out json &amp;gt; what-if-output.json&lt;/LI-CODE&gt;
&lt;P data-line="51"&gt;The JSON output contains a&amp;nbsp;changes&amp;nbsp;array where each entry has:&lt;/P&gt;
&lt;UL data-line="52"&gt;
&lt;LI data-line="52"&gt;resourceId&amp;nbsp;— the full ARM resource ID&lt;/LI&gt;
&lt;LI data-line="53"&gt;changeType&amp;nbsp;— one of&amp;nbsp;Create,&amp;nbsp;Modify,&amp;nbsp;Delete,&amp;nbsp;NoChange,&amp;nbsp;Deploy&lt;/LI&gt;
&lt;LI data-line="54"&gt;before&amp;nbsp;and&amp;nbsp;after&amp;nbsp;— the full resource properties for modifications&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="56"&gt;This is the foundation: the what-if output tells us&amp;nbsp;&lt;EM&gt;what&lt;/EM&gt;&amp;nbsp;is changing, and we can use that to look up&amp;nbsp;&lt;EM&gt;what it costs&lt;/EM&gt;.&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;what-if-cli-output&lt;/P&gt;
&lt;/img&gt;
&lt;H3 data-line="94"&gt;Step 2: Map Resources to Pricing with the Retail Prices API&lt;/H3&gt;
&lt;P data-line="64"&gt;The&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/rest/api/cost-management/retail-prices/azure-retail-prices" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/rest/api/cost-management/retail-prices/azure-retail-prices"&gt;Azure Retail Prices API&lt;/A&gt;&amp;nbsp;is a free, unauthenticated REST API that returns pay-as-you-go pricing for any Azure service.&lt;/P&gt;
&lt;P data-line="66"&gt;Here's a Python script that takes a VM SKU and region and returns the monthly cost:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import requests

def get_vm_price(sku_name: str, region: str = "eastus") -&amp;gt; float | None:
    """Query the Azure Retail Prices API for a Linux VM's pay-as-you-go hourly rate."""
    api_url = "https://prices.azure.com/api/retail/prices"
    
    odata_filter = (
        f"armRegionName eq '{region}' "
        f"and armSkuName eq '{sku_name}' "
        f"and priceType eq 'Consumption' "
        f"and serviceName eq 'Virtual Machines' "
        f"and contains(meterName, 'Spot') eq false "
        f"and contains(productName, 'Windows') eq false"
    )
    
    response = requests.get(api_url, params={"$filter": odata_filter})
    response.raise_for_status()
    
    items = response.json().get("Items", [])
    if not items:
        return None
    
    hourly_rate = items[0]["retailPrice"]
    monthly_estimate = hourly_rate * 730  # avg hours per month
    return round(monthly_estimate, 2)


# Example usage
before_cost = get_vm_price("Standard_D4s_v5")   # e.g., $140.16/mo
after_cost = get_vm_price("Standard_D8s_v5")     # e.g., $280.32/mo
delta = after_cost - before_cost                  # +$140.16/mo&lt;/LI-CODE&gt;
&lt;P&gt;You can extend this pattern for other resource types — App Service Plans, Azure SQL databases, managed disks, etc. — by adjusting the serviceName and meterName filters.&lt;/P&gt;
&lt;img /&gt;
&lt;H3 data-line="140"&gt;Step 3: Build the GitHub Actions Workflow&lt;/H3&gt;
&lt;P&gt;Here's a complete GitHub Actions workflow that ties it all together:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;name: Cost Estimate on PR

on:
  pull_request:
    paths:
      - "infra/**"

permissions:
  id-token: write      # For Azure OIDC login
  contents: read
  pull-requests: write  # To post comments

jobs:
  cost-estimate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Azure Login (OIDC)
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: Run Bicep What-If
        run: |
          az deployment group what-if \
            --resource-group ${{ vars.RESOURCE_GROUP }} \
            --template-file infra/main.bicep \
            --parameters infra/main.bicepparam \
            --out json &amp;gt; what-if-output.json

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: pip install requests

      - name: Estimate cost delta
        id: cost
        run: |
          python infra/scripts/estimate_costs.py \
            --what-if-file what-if-output.json \
            --output-format github &amp;gt;&amp;gt; "$GITHUB_OUTPUT"

      - name: Comment on PR
        uses: marocchino/sticky-pull-request-comment@v2
        with:
          header: cost-estimate
          message: |
            ## 💰 Infrastructure Cost Estimate

            | Resource | Change | Before ($/mo) | After ($/mo) | Delta |
            |----------|--------|---------------|--------------|-------|
            ${{ steps.cost.outputs.table_rows }}

            **Estimated monthly impact: ${{ steps.cost.outputs.total_delta }}**

            _Prices are pay-as-you-go estimates from the Azure Retail Prices API. 
            Actual costs may vary with reservations, savings plans, or hybrid benefit._

      - name: Gate on budget threshold
        if: ${{ steps.cost.outputs.delta_value &amp;gt; 500 }}
        run: |
          echo "::error::Monthly cost increase exceeds $500 threshold. Requires finance team approval."
          exit 1&lt;/LI-CODE&gt;
&lt;H3 data-line="187"&gt;Step 4: The Cost Estimation Script&lt;/H3&gt;
&lt;P&gt;Here's the core of infra/scripts/estimate_costs.py that parses the what-if output and queries prices:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;#!/usr/bin/env python3
"""Parse Bicep what-if output and estimate cost deltas using Azure Retail Prices API."""

import json
import argparse
import requests


PRICE_API = "https://prices.azure.com/api/retail/prices"

# Map ARM resource types to Retail API service names
RESOURCE_TYPE_MAP = {
    "Microsoft.Compute/virtualMachines": "Virtual Machines",
    "Microsoft.Compute/disks": "Storage",
    "Microsoft.Web/serverfarms": "Azure App Service",
    "Microsoft.Sql/servers/databases": "SQL Database",
}


def get_price(service_name: str, sku: str, region: str) -&amp;gt; float:
    """Query Azure Retail Prices API and return monthly cost estimate."""
    odata_filter = (
        f"armRegionName eq '{region}' "
        f"and armSkuName eq '{sku}' "
        f"and priceType eq 'Consumption' "
        f"and serviceName eq '{service_name}'"
    )
    resp = requests.get(PRICE_API, params={"$filter": odata_filter})
    resp.raise_for_status()
    items = resp.json().get("Items", [])
    if not items:
        return 0.0
    return items[0]["retailPrice"] * 730


def parse_what_if(filepath: str) -&amp;gt; list[dict]:
    """Extract resource changes from what-if JSON output."""
    with open(filepath) as f:
        data = json.load(f)

    results = []
    for change in data.get("changes", []):
        change_type = change.get("changeType", "")
        resource_type = change.get("resourceId", "").split("/providers/")[-1].split("/")[0:2]
        resource_type_str = "/".join(resource_type) if len(resource_type) == 2 else ""

        if resource_type_str not in RESOURCE_TYPE_MAP:
            continue

        before_sku = (change.get("before") or {}).get("sku", {}).get("name", "")
        after_sku = (change.get("after") or {}).get("sku", {}).get("name", "")
        region = (change.get("after") or change.get("before") or {}).get("location", "eastus")

        service = RESOURCE_TYPE_MAP[resource_type_str]
        before_price = get_price(service, before_sku, region) if before_sku else 0.0
        after_price = get_price(service, after_sku, region) if after_sku else 0.0

        results.append({
            "resource": change.get("resourceId", "").split("/")[-1],
            "change_type": change_type,
            "before": round(before_price, 2),
            "after": round(after_price, 2),
            "delta": round(after_price - before_price, 2),
        })

    return results


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--what-if-file", required=True)
    parser.add_argument("--output-format", default="text", choices=["text", "github"])
    args = parser.parse_args()

    changes = parse_what_if(args.what_if_file)
    total_delta = sum(c["delta"] for c in changes)

    if args.output_format == "github":
        rows = []
        for c in changes:
            sign = "+" if c["delta"] &amp;gt;= 0 else ""
            rows.append(
                f"| {c['resource']} | {c['change_type']} "
                f"| ${c['before']:.2f} | ${c['after']:.2f} "
                f"| {sign}${c['delta']:.2f} |"
            )
        print(f"table_rows={'chr(10)'.join(rows)}")
        sign = "+" if total_delta &amp;gt;= 0 else ""
        print(f"total_delta={sign}${total_delta:.2f}/mo")
        print(f"delta_value={total_delta}")
    else:
        for c in changes:
            print(f"{c['resource']}: {c['change_type']} "
                  f"${c['before']:.2f} → ${c['after']:.2f} "
                  f"(Δ ${c['delta']:+.2f})")
        print(f"\nTotal monthly delta: ${total_delta:+.2f}")


if __name__ == "__main__":
    main()&lt;/LI-CODE&gt;
&lt;H2 data-line="296"&gt;What the Developer Experience Looks Like&lt;/H2&gt;
&lt;P data-line="298"&gt;Once this pipeline is in place, every PR that touches infrastructure files gets an automatic cost comment:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Resource&lt;/th&gt;&lt;th&gt;Change&lt;/th&gt;&lt;th&gt;Before ($/mo)&lt;/th&gt;&lt;th&gt;After ($/mo)&lt;/th&gt;&lt;th&gt;Delta&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;vm-api-prod&lt;/td&gt;&lt;td&gt;Modify&lt;/td&gt;&lt;td&gt;$140.16&lt;/td&gt;&lt;td&gt;$280.32&lt;/td&gt;&lt;td&gt;+$140.16&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;disk-data-01&lt;/td&gt;&lt;td&gt;Create&lt;/td&gt;&lt;td&gt;$0.00&lt;/td&gt;&lt;td&gt;$73.22&lt;/td&gt;&lt;td&gt;+$73.22&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;plan-webapp&lt;/td&gt;&lt;td&gt;NoChange&lt;/td&gt;&lt;td&gt;$69.35&lt;/td&gt;&lt;td&gt;$69.35&lt;/td&gt;&lt;td&gt;+$0.00&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="306"&gt;&lt;STRONG&gt;Estimated monthly impact: +$213.38/mo&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-line="306"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P data-line="306"&gt;If the delta exceeds a configurable threshold (e.g., $500/mo), the pipeline fails and requires explicit approval — just like a failing test.&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H2 data-line="316"&gt;Extending This Further&lt;/H2&gt;
&lt;P data-line="318"&gt;Here are some ways to take this pipeline to the next level:&lt;/P&gt;
&lt;OL data-line="320"&gt;
&lt;LI data-line="320"&gt;&lt;STRONG&gt;Support Azure Savings Plans and Reservations&lt;/STRONG&gt;&amp;nbsp;— Query the Prices API with&amp;nbsp;priceType eq 'Reservation'&amp;nbsp;and show both pay-as-you-go and committed pricing&lt;/LI&gt;
&lt;LI data-line="321"&gt;&lt;STRONG&gt;Track cost trends over time&lt;/STRONG&gt;&amp;nbsp;— Store estimates in Azure Table Storage or a database and build a dashboard showing cost trajectory per environment&lt;/LI&gt;
&lt;LI data-line="322"&gt;&lt;STRONG&gt;Add Slack/Teams notifications&lt;/STRONG&gt;&amp;nbsp;— Alert the team channel when a PR exceeds the threshold&lt;/LI&gt;
&lt;LI data-line="323"&gt;&lt;STRONG&gt;Tag-based cost allocation&lt;/STRONG&gt;&amp;nbsp;— Parse resource tags from Bicep to attribute costs to teams or projects&lt;/LI&gt;
&lt;LI data-line="324"&gt;&lt;STRONG&gt;Multi-environment estimates&lt;/STRONG&gt;&amp;nbsp;— Run the pipeline against dev, staging, and prod parameter files to show total organizational impact&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2 data-line="328"&gt;Key Takeaways&lt;/H2&gt;
&lt;UL data-line="330"&gt;
&lt;LI data-line="330"&gt;&lt;STRONG&gt;Azure's What-If API&lt;/STRONG&gt;&amp;nbsp;gives you a deployment preview without making changes — use it as the foundation for any pre-deployment validation&lt;/LI&gt;
&lt;LI data-line="331"&gt;&lt;STRONG&gt;The Azure Retail Prices API&lt;/STRONG&gt;&amp;nbsp;is free, requires no authentication, and returns granular pricing data you can query programmatically&lt;/LI&gt;
&lt;LI data-line="332"&gt;&lt;STRONG&gt;Cost gates in CI/CD&lt;/STRONG&gt;&amp;nbsp;treat budget overruns the same way you treat test failures — as merge blockers that require explicit action&lt;/LI&gt;
&lt;LI data-line="333"&gt;&lt;STRONG&gt;Shift cost left&lt;/STRONG&gt;&amp;nbsp;— just like security and testing, catching cost issues at PR time is 10x cheaper than catching them on the monthly bill&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="335"&gt;Infrastructure cost is infrastructure quality. By integrating cost estimation into your pull request workflow, you give every developer on the team visibility into the financial impact of their changes — before a single resource is deployed.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2026 06:28:43 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-cost-aware-azure-infrastructure-pipelines-estimate/ba-p/4508776</guid>
      <dc:creator>whosocurious</dc:creator>
      <dc:date>2026-04-06T06:28:43Z</dc:date>
    </item>
    <item>
      <title>Demystifying On-Demand Capacity Reservations</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/demystifying-on-demand-capacity-reservations/ba-p/4504806</link>
      <description>&lt;H2&gt;About On-Demand Capacity Reservations&lt;/H2&gt;
&lt;H3&gt;Introducing the “parking garage” metaphor&lt;/H3&gt;
&lt;P&gt;There are dozens of VM types available in Azure which span multiple generations of CPU across vendors and architectures.&amp;nbsp; Within each Azure region are datacenters hosting pools of hardware which runs Azure services, such as virtual machines, of those types.&amp;nbsp; As VMs are started and stopped by customers there is a constant ebb and flow of available capacity to run each type of VM within the region.&amp;nbsp; Available capacity is driven by the rhythms of the business day, which creates variations in utilization on an hour-to-hour and even minute-to-minute basis. &amp;nbsp;Longer cycles of demand such as holiday seasons, school calendars and other real-world events are also a factor.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When you command an Azure Virtual Machine (VM) to start, the &lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/azure-resource-manager/management/overview" target="_blank" rel="noopener"&gt;Azure Resource Manager (ARM)&lt;/A&gt; – the “engine” that manages resources in the Microsoft cloud -- needs to do a few things to make it happen.&amp;nbsp; The most important of these is that it needs to identify hardware within the target region with sufficient capacity to bring the desired type and size of VM online at that moment in time.&amp;nbsp; If ARM finds space for the desired VM size, the VM starts normally.&amp;nbsp; However, if there is no room to start the desired VM, you will see an error similar to this one:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This process of finding a place to start up an Azure VM has a lot of similarities to finding a place to park a vehicle.&amp;nbsp; Parking facilities are built to handle typical demand for their location.&amp;nbsp; If something is going on nearby, such as a large sporting event, which causes the need for parking to be much higher than normal then you might be out of luck when you try to find a spot because the garage is simply full.&lt;/P&gt;
&lt;P&gt;During periods of high demand in Azure this can result in VMs failing to start simply because there is nowhere to run them at that particular moment.&amp;nbsp; If this happens to a VM which needed to be stopped for a configuration change or other reasons this can cause impact to your environment which you certainly want to avoid.&lt;/P&gt;
&lt;H3&gt;On-Demand Capacity Reservations&lt;/H3&gt;
&lt;P&gt;Azure has a resource called an &lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/virtual-machines/capacity-reservation-overview" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;On-Demand Capacity Reservation&lt;/STRONG&gt;&lt;/A&gt;, or ODCR, which allows you to reserve a spot for a VM in the appropriate hardware within a region for a specific VM size.&amp;nbsp; This is similar to “owning" a parking space: It’s a reserved place exclusively for the use of a specific VM.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;At a high level, the way this works is that you create an ODCR which matches the Azure region, availability zone and specific VM type, such as for a VM of type D16s_v6 in availability zone 2 of the Canada Central Azure region.&amp;nbsp; Once the reservation is created, an Azure VM that matches that configuration can be associated to it so the VM now “owns” that “parking space”.&amp;nbsp; This gives that VM priority over others of the same type when it needs to start because it already has a “parking space” assigned to it that can't be used by another one.&lt;/P&gt;
&lt;H3&gt;More detail about VM startup&lt;/H3&gt;
&lt;P&gt;Before we get further into what ODCRs are and how they work, it’s important to know a few more things about starting up a VM.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Azure does &lt;U&gt;not provide an explicit SLA for VM startup&lt;/U&gt; for virtual machines without an ODCR.&amp;nbsp; The process of finding a hypervisor slot to boot up a VM is purely a “best effort” action on Azure’s part.&lt;/P&gt;
&lt;P&gt;Having&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/virtual-machines/quotas" target="_blank" rel="noopener"&gt;quota&lt;/A&gt; headroom does not help with VM startup.&amp;nbsp; Quota in Azure is your "credit limit" for creating VMs.&amp;nbsp; Quota grants permission to create up to a certain number of cores’ worth of Virtual Machines from a particular family (like Ds_v6) but has no effect on whether you can actually start the machine once it’s created.&lt;/P&gt;
&lt;P&gt;Similarly, having a &lt;A class="lia-external-url" href="https://azure.microsoft.com/pricing/offers/reservations/vm-instances" target="_blank" rel="noopener"&gt;Reserved Instance&lt;/A&gt; purchase or a &lt;A class="lia-external-url" href="https://azure.microsoft.com/pricing/offers/savings-plans" target="_blank" rel="noopener"&gt;Savings Plan&lt;/A&gt; for a particular number of cores of a given VM family does not have any impact on the ability to start a VM either.&amp;nbsp; These mechanisms are a &lt;U&gt;discount mechanism only&lt;/U&gt; where the customer pre-pays for a certain amount of VM cores to be running 24x7 at a discounted rate.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Assigning an ODCR to a virtual machine applies a formal SLA on startup for it&lt;/STRONG&gt;.&amp;nbsp; VMs with ODCRs get &lt;U&gt;priority&lt;/U&gt; over ones that don’t so the likelihood of a successful startup is much higher for VMs that have one compared to those that do not, especially during times when Azure is experiencing a period of high demand for that particular VM type.&amp;nbsp; The actual language of the ODCR SLA can be found in Microsoft's &lt;A class="lia-external-url" href="https://www.microsoft.com/licensing/docs/view/service-level-agreements-sla-for-online-services" target="_blank" rel="noopener"&gt;Service Level Agreements for Online Services&lt;/A&gt; document which can be downloaded from the linked site.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Cost Implications of ODCRs&lt;/H3&gt;
&lt;P&gt;These are the key points that you need to know about how billing works for ODCRs:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The compute cost for the &lt;S&gt;parking space&lt;/S&gt; capacity reservation for a VM is &lt;U&gt;exactly the same as a running VM of the same size&lt;/U&gt;. There is no “double billing” for a VM to have an ODCR associated with it.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;Billing for the ODCR starts immediately if the quantity of reserved "parking spaces" is greater than zero.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;Stopping a VM that has an ODCR associated with it does not impact cost. This is because the ODCR is holding the reserved hypervisor slot even if the VM is not running.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;Having a Reserved Instance purchase or Savings Plan &lt;U&gt;which covers the same scope as the ODCR&lt;/U&gt; means that the VM will be billed at the discounted rate.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;Are there any cases where using ODCRs results in paying more for a VM?&lt;/H4&gt;
&lt;P&gt;There are two cases that I’ve identified where you pay for two ODCRs for the same VM.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First, if you are using&amp;nbsp;&lt;A class="lia-external-url" href="https://azure.microsoft.com/products/site-recovery" target="_blank" rel="noopener"&gt;Azure Site Recovery&lt;/A&gt; to protect a VM in Azure by replicating it to another location, you have the option to associate the remote replica of the VM with a capacity reservation.&amp;nbsp; This helps ensure that the replica will start when it’s called upon because it has a pre-allocated spot reserved for it.&amp;nbsp; In this situation, if the original VM also is associated with an ODCR you are paying for both the original (running) VM and also for the reservation being held for its replica.&lt;/P&gt;
&lt;P&gt;Second, and similarly, when setting up replication for a VM that is preparing for migration into Azure via Azure Migrate, you can associate a capacity reservation with the replica for similar reasons to the above ASR example -- to ensure that the VM will start when its migrated replica is activated.&amp;nbsp; If the source machine is also in Azure then you are again paying twice for the same machine.&lt;/P&gt;
&lt;H3&gt;When should I use them?&lt;/H3&gt;
&lt;P&gt;Capacity Reservations are an important element when designing for resiliency.&amp;nbsp; They help ensure that VMs will be online when needed, even if they have to be shut down for some reason.&amp;nbsp; For example, there was an incident where a customer had to shut down a VM that was serving as a firewall appliance to make an adjustment to its configuration and it failed to start up afterwards because of a capacity-related failure.&amp;nbsp; This resulted in significant impact due to the loss of connectivity for systems dependent on the firewall for connectivity until they were able to bring it back online.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Based on field experience and resiliency assessments, applying ODCRs to VMs that must be available 24x7 is strongly recommended.&lt;/STRONG&gt; Examples of this include key functions like AD domain controllers, application servers and database servers.&amp;nbsp; Also, any VM-based appliances that may be running as firewalls, load balancers or other infrastructure-support services should be considered as well.&lt;/P&gt;
&lt;P&gt;Microsoft offers assessments which review a workload for gaps that impact resiliency in many dimensions including outages in Azure.&amp;nbsp; These assessments include checks for the presence of capacity reservations and will report any VM’s that do not have them as a high-risk finding.&lt;/P&gt;
&lt;H3&gt;Not all VM stops in Azure are voluntary&lt;/H3&gt;
&lt;P&gt;Even if you are careful to never stop a VM yourself it can sometimes happen.&amp;nbsp; Not every shutdown of a VM in Azure is user-initiated.&amp;nbsp; Involuntary shutdowns are rare but they can occur due to predictive hardware failures or other events which ARM will respond to by stopping the VM in order to move it out of harm's way.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Creating On-Demand Capacity Reservations&lt;/H2&gt;
&lt;P&gt;This section covers the components of an ODCR, the process of creating them and why creating them can fail.&lt;/P&gt;
&lt;H3&gt;Components of an ODCR:&lt;/H3&gt;
&lt;P&gt;An ODCR has two components to it.&amp;nbsp; The first part is a &lt;STRONG&gt;Capacity Reservation Group&lt;/STRONG&gt; (CRG) which is simply a "bucket" for any number of capacity reservations.&amp;nbsp; To create a CRG you only need to provide its name, the region that it will be used for and which availability zones within that region it will have access to.&lt;/P&gt;
&lt;P&gt;The second -- and more important -- component is the actual &lt;STRONG&gt;Capacity Reservation&lt;/STRONG&gt; which is created within a CRG.&amp;nbsp; The capacity reservation requires:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;The name of the reservation&lt;/STRONG&gt;. Including the VM size and other details in the name is useful to reduce ambiguity.&amp;nbsp; An example could be “Zone1_D16s_v5”&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The specific VM size&lt;/STRONG&gt; the reservation is for, such as “D16s_v5”&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The availability zone&lt;/STRONG&gt; of the reservation. You can also create a &lt;STRONG&gt;regional&lt;/STRONG&gt; reservation, where the VM is “zoneless”, as well.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;The number of &lt;/STRONG&gt;&lt;S&gt;parking spaces&lt;/S&gt; &lt;STRONG&gt;instances &lt;/STRONG&gt;that the reservation holds.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;ODCRs can be created via the Azure portal, from the command line using PowerShell or the Azure CLI or deployed through IaC tools such as Bicep or Terraform.&amp;nbsp; CRGs also can also be &lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/virtual-machines/capacity-reservation-group-share" target="_blank" rel="noopener"&gt;shared across subscriptions&lt;/A&gt;, which allows a CRG created and managed in one subscription to be utilized by VMs in a different subscription.&lt;/P&gt;
&lt;P&gt;When the ODCR is created, if the number of instances it contains is higher than zero then ARM will attempt to allocate the desired number of instances of the specified VM type in the target region/zone.&amp;nbsp; If there is capacity available for this then the creation succeeds and you can move on to associating machines with it to give them the protection of the ODCR.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If creating the ODCR is unsuccessful, the cause can be a variety of things, including:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;U&gt;No open hypervisor slots&lt;/U&gt; for the desired VM in the target location – the “parking lot” was full at the moment the request was submitted. This can result from outages within Azure that reduce capacity as well as demand pressure.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;There is &lt;U&gt;insufficient quota in the subscription&lt;/U&gt; to claim the necessary number of VM cores for the reservation in the region.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;The &lt;U&gt;VM type is simply not available&lt;/U&gt; in the target region or AZ.&amp;nbsp; Since not all Azure regions are provisioned with identical hardware this can be the cause, especially for VM types other than the popular D, E and F series machines.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;A &lt;U&gt;restriction&lt;/U&gt; is applied to the subscription, zone or region that blocks creation of the reservation for some reason.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;What you can do if creating an ODCR fails&lt;/H3&gt;
&lt;P&gt;Some things that may help if creating a capacity reservation fails and you know that quota or other restrictions are not a factor are below.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Not coincidentally, these are the same recommendations that you should try when a VM fails to start because the same ARM action – finding and allocating hardware with free capacity to start the VM – is taking place.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;IN GENERAL, creating an ODCR outside of business hours has a higher probability of success.&amp;nbsp; Demand for Azure services typically drops off at the end of the business day where the region is located.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;Consider using a different VM type, availability zone or a different Azure region.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;A script or other automation that retries at intervals until the reservation succeeds in claiming the desired number of spots can help, though it can take an unknown amount of time before this works.&amp;nbsp; It may need to run for days or even weeks before it succeeds.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Submitting a support ticket will create visibility to your situation from Microsoft. &amp;nbsp;If the root cause is something other than capacity, support can identify that cause and provide guidance on how to resolve it. &amp;nbsp;If the issue truly is a capacity squeeze, the ability of support to help get the reservation created is extremely limited because the support folks, while helpful, are not able to create capacity where none exists.&amp;nbsp; In this case the support teams will usually refer you to the three options above.&lt;/P&gt;
&lt;H3&gt;Protecting a VM with an ODCR&lt;/H3&gt;
&lt;P&gt;Once you have the ODCR created, applying it to a VM is straightforward.&amp;nbsp; To do this from the portal, open the configuration tab on the VM’s screen.&amp;nbsp; Then scroll to the bottom of the panel that appears to find the “Capacity reservations” section.&amp;nbsp; Select “Capacity reservation group” from the list.&amp;nbsp; The list of capacity reservation groups that match the VM will appear in a drop-down menu below.&amp;nbsp; Select the CRG that the VM should use and click “Apply”.&lt;/P&gt;
&lt;P&gt;If you are using an Infrastructure-as-Code approach such as Bicep or Terraform, an Azure VM is linked to a CRG by specifying the resource ID of the CRG&amp;nbsp; in the appropriate property on the VM definition.&lt;/P&gt;
&lt;H3&gt;Impact of associating a virtual machine with an ODCR:&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;If the VM is &lt;U&gt;not running&lt;/U&gt; then the change takes effect immediately.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;If the VM is &lt;U&gt;running and has &lt;STRONG&gt;no zone&lt;/STRONG&gt; assignment&lt;/U&gt; (a “regional” VM) then it &lt;EM&gt;must be stopped and restarted&lt;/EM&gt; for the protection of the ODCR to apply.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;If the VM is &lt;U&gt;running and &lt;STRONG&gt;has&lt;/STRONG&gt; a zone assignment&lt;/U&gt; then the change is immediate and there is no disruption to the VM.&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P class=""&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;SPAN class="lia-text-color-8"&gt;Important note for Terraform users:&lt;/SPAN&gt;&amp;nbsp;&lt;/STRONG&gt; There appears to be a critical behavior difference between how the&amp;nbsp;&lt;A class="lia-external-url" href="https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs" target="_blank"&gt;AzureRM&lt;/A&gt; provider and the &lt;A class="lia-external-url" href="https://registry.terraform.io/providers/Azure/azapi/latest" target="_blank"&gt;Azapi&lt;/A&gt; provider handle this change.&amp;nbsp; If you use the &lt;STRONG&gt;AzureRM&lt;/STRONG&gt; provider,&amp;nbsp;&lt;STRONG&gt;Terraform will always perform an immediate stop/deallocate of the VM&lt;/STRONG&gt;, apply the change and then start the VM up again.&amp;nbsp; The Azapi provider works as documented above. I&lt;/EM&gt;&lt;EM&gt; believe this a result of how Hashicorp coded the AzureRM provider to manage Azure resources.&lt;/EM&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3&gt;Where an ODCR is not the right answer&lt;/H3&gt;
&lt;P&gt;ODCRs are most effective when they are used to protect VMs that need to always be running because they are providing essential services.&amp;nbsp; Examples include AD domain controllers, firewall or load balancer appliances, database servers, integration servers that support workflows and the like.&lt;/P&gt;
&lt;P&gt;The primary thing to keep in mind is the cost impact of the ODCRs and whether they are necessary for the service to be functioning.&amp;nbsp; Environments where machines come and go frequently, such as scale in/out setups used to minimize cost, are not ideal for ODCRs. &amp;nbsp;For example, if you have a pool of app servers configured for scale-out, using ODCRs to cover the entire size of the pool means you would be paying for all machines, whether they are actually online or not.&amp;nbsp; A possible approach in a scale-out environment is to determine the &lt;U&gt;minimum&lt;/U&gt; number of VMs necessary for the service to be available -- even in a degraded state -- and use an ODCR to protect that number of instances.&amp;nbsp; This way you can have confidence that at least that number of machines in the pool will always be running even if an attempt to scale out fails.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Working with On-Demand Capacity Reservations&lt;BR /&gt;(and three interesting behaviors that you should know about)&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This section discusses some ins and outs of working with ODCRs in your environment, especially if you need to apply them to existing machines.&amp;nbsp; This is a common scenario when you are attempting to improve the resiliency of a set of VMs against impacts from maintenance, outages or other situations that may cause VMs to restart.&lt;/P&gt;
&lt;H3&gt;“Associated” vs “Allocated”&lt;/H3&gt;
&lt;P&gt;A capacity reservation group will always have ownership of some number of "parking spots" within a region.&amp;nbsp; The number that it holds is referred to as the reservation's &lt;STRONG&gt;capacity&lt;/STRONG&gt; which is expressed as a number of &lt;STRONG&gt;allocated instances&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;When you link a VM to a CRG, the VM becomes &lt;STRONG&gt;associated&lt;/STRONG&gt; with the CRG and can take advantage of the protection that it offers from matching reservations that it contains.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It is possible to associate &lt;U&gt;more&lt;/U&gt; VMs to a CRG than it has allocated capacity for.&amp;nbsp; This is called &lt;STRONG&gt;overallocation&lt;/STRONG&gt;.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When a CRG is overallocated, the VMs associated with it are protected on a&amp;nbsp;&lt;U&gt;first-come-first-served basis based on when they were started&lt;/U&gt;.&amp;nbsp; If, for example, there are four VMs associated with a CRG but the CRG only has an allocated capacity of two, the first two associated machines which were started will receive protection but the others will not.&lt;/P&gt;
&lt;H3&gt;“Interesting” On-Demand Capacity Reservation behavior #1:&lt;/H3&gt;
&lt;P&gt;Here is the first of three interesting behaviors that you can use to your advantage when working with ODCRs.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;You can add a running VM to a capacity reservation group.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;As mentioned previously, if the VM is zonal then the change is immediate and nondisruptive.&amp;nbsp; If the VM is regional then the VM must be stopped and restarted for the change to take effect.&lt;/P&gt;
&lt;P&gt;This is conceptually different from other Azure mechanisms used for resiliency such as &lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/virtual-machines/availability-set-overview" target="_blank" rel="noopener"&gt;Availability Sets&lt;/A&gt;.&amp;nbsp; You can only add a VM to an availability set at the time the VM is created but you can add or remove a VM from a Capacity Reservation Group at any time whether the VM is running or not.&lt;/P&gt;
&lt;H3&gt;“Interesting” On-Demand Capacity Reservation behavior #2&lt;/H3&gt;
&lt;P&gt;Interesting behavior #2 is deceptively simple.&amp;nbsp; &lt;STRONG&gt;When creating a reservation, you can specify a capacity (number of allocated instances) of zero.&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This should always succeed because Azure needs to take no action to fulfill it -- this is just a metadata adjustment for the reservation within the CRG.&lt;/P&gt;
&lt;P&gt;This seems to not be terribly useful at first glance but keep reading.&lt;/P&gt;
&lt;H3&gt;“Interesting” On-Demand Capacity Reservation behavior #3&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;If the number of associated VMs is higher than the allocated capacity of the reservation, you can increase the capacity of the reservation to cover the running VMs&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Why does this work?&amp;nbsp; Because running VMs, by definition, have a &lt;S&gt;parking spot&lt;/S&gt; hypervisor allocation already so Azure doesn’t need to find one for it -- Azure can simply link the capacity reservation to the hypervisor slot that the running VM is using.&lt;/P&gt;
&lt;H3&gt;The payoff!&amp;nbsp; Or, using these three behaviors to your advantage&lt;/H3&gt;
&lt;P&gt;Because ODCRs are relatively new and have not yet been adopted widely, a common finding to emerge from field resiliency assessments of running workloads is that the VMs that support the workload need to have ODCRs applied to them.&amp;nbsp; In large environments there may be dozens or even hundreds of VMs that need to be protected.&amp;nbsp; The process for doing this can seem daunting to a technical team that is not familiar with ODCRs.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thankfully, these three behaviors make it possible to easily protect &lt;U&gt;any number of running machines&lt;/U&gt; with a very high probability of success -- and &lt;U&gt;zero disruption&lt;/U&gt; if they are zonal VMs -- by proceeding in this order:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Create a CRG&lt;/STRONG&gt; with a reservation for the region, AZ and VM type for the machine(s) that need to be covered &lt;STRONG&gt;with a quantity of zero&lt;/STRONG&gt;. &lt;EM&gt;(Interesting behavior #2)&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Associate the VMs to the capacity reservation group&lt;/STRONG&gt;. At this point the CRG is overallocated so the machines are not yet protected.&amp;nbsp; Remember that if the VMs are regional, a restart is required to finalize the ODCR assignment. &amp;nbsp;&lt;EM&gt;(Interesting behavior #1)&lt;BR /&gt;&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Update the reservation&lt;/STRONG&gt; within the CRG to increase the number of allocated instances to match the number of &lt;STRONG&gt;running &lt;/STRONG&gt;VMs. &lt;EM&gt;(Interesting behavior #3&lt;/EM&gt;)&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;When the number of instances on the reservation is equal to or higher than the number of VMs associated with it, all of the associated VMs are protected and you’re done!&lt;/P&gt;
&lt;H2&gt;Final thoughts&lt;/H2&gt;
&lt;P&gt;This leads to a final piece of advice about working with ODCRs, especially when you know that capacity is a challenge in the target region:&amp;nbsp; As a field CSA, I recommend that you &lt;STRONG&gt;bring VMs online first, then apply a capacity reservation to them.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Why?&amp;nbsp; If you already have a set of running VMs that need to be protected then following what seems like the obvious process: Creating a CRG, creating reservations within it for the correct number of instances and then associating the VMs with the reservation – has a risk of failure at the step of creating the ODCR because Azure needs to find and allocate additional hypervisor slots for the reservation to own.&amp;nbsp; This can be challenging when there is a lot of demand for the VM type.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As the example in the previous section showed, &lt;STRONG&gt;it’s much easier to protect VMs that are already online&lt;/STRONG&gt; by associating them with an existing capacity reservation, even if it doesn’t have enough instances allocated to it, and then increasing the capacity of the ODCR to cover the running machines.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;References:&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview" target="_blank" rel="noopener"&gt;On-Demand Capacity Reservations Overview&lt;/A&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Monitor the &lt;A href="https://learn.microsoft.com/azure/virtual-machines/capacity-reservation-overview" target="_blank" rel="noopener"&gt;list of restrictions&lt;/A&gt; on VM eligibility because it changes frequently&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview" target="_blank" rel="noopener"&gt;SLA Details for On-Demand Capacity Reservations&lt;/A&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Legal fine print is in the &lt;A href="https://aka.ms/CapacityReservationSLAForVM" target="_blank" rel="noopener"&gt;consolidated SLA for Online Services (.docx)&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Some details about &lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overallocate" target="_blank" rel="noopener"&gt;Overallocating&lt;/A&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overallocate" target="_blank" rel="noopener"&gt; capacity reservations&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Information on creating a &lt;A href="https://learn.microsoft.com/azure/templates/microsoft.compute/capacityreservationgroups" target="_blank" rel="noopener"&gt;Capacity Reservation Group&lt;/A&gt; via Bicep, Terraform or ARM template.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2026 17:21:11 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/demystifying-on-demand-capacity-reservations/ba-p/4504806</guid>
      <dc:creator>KenHooverMSFT</dc:creator>
      <dc:date>2026-04-14T17:21:11Z</dc:date>
    </item>
    <item>
      <title>DevSecOps on AKS: Governance Gates That Actually Prevent Incidents</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/devsecops-on-aks-governance-gates-that-actually-prevent/ba-p/4508415</link>
      <description>&lt;P&gt;This article is for&amp;nbsp;&lt;STRONG&gt;AKS platform/infra engineers&lt;/STRONG&gt;, &lt;STRONG&gt;SREs&lt;/STRONG&gt;, and &lt;STRONG&gt;security teams&lt;/STRONG&gt; who want a practical, enforceable model for stopping common Kubernetes misconfigurations &lt;EM&gt;before&lt;/EM&gt; they become incidents—without turning delivery into bureaucracy.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;H2&gt;Why incidents still happen after “adding security to the pipeline”&lt;/H2&gt;
&lt;P&gt;Most teams do &lt;EM&gt;some&lt;/EM&gt; of these:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;container image scanning&lt;/LI&gt;
&lt;LI&gt;secret scanning&lt;/LI&gt;
&lt;LI&gt;IaC checks&lt;/LI&gt;
&lt;LI&gt;PR approvals&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Those are useful, but they don’t help when:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;someone deploys directly with kubectl&lt;/LI&gt;
&lt;LI&gt;a pipeline is misconfigured or bypassed&lt;/LI&gt;
&lt;LI&gt;drift accumulates over time&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The AKS baseline guidance emphasizes &lt;STRONG&gt;governance through policy and admission control&lt;/STRONG&gt; as a core way to manage and secure clusters&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;H2&gt;The AKS DevSecOps model (where the gates belong)&lt;/H2&gt;
&lt;P&gt;A workable DevSecOps model in AKS applies controls across four layers:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Pre‑deployment (CI)&lt;/STRONG&gt; – early feedback and guardrails&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Admission control (Governance gates)&lt;/STRONG&gt; – hard enforcement, prevents bad configs&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Runtime protection&lt;/STRONG&gt; – detection/response if something slips through&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Continuous compliance&lt;/STRONG&gt; – drift detection and auditing&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;This aligns with Microsoft’s AKS security guidance that emphasizes upgrades, policy governance, and operational monitoring as core best practices&lt;/P&gt;
&lt;H1&gt;The governance gates that prevent incidents&lt;/H1&gt;
&lt;H2&gt;Gate 1 — Azure Policy for AKS (OPA Gatekeeper at admission)&lt;/H2&gt;
&lt;P&gt;Azure Policy extends &lt;STRONG&gt;Gatekeeper (OPA)&lt;/STRONG&gt; to provide centralized, consistent enforcement of Kubernetes policies across AKS clusters. It installs as an add‑on/extension and can &lt;STRONG&gt;block non‑compliant resources at creation time&lt;/STRONG&gt;, while also reporting compliance back to Azure Policy.&lt;/P&gt;
&lt;H3&gt;How it works (in plain terms)&lt;/H3&gt;
&lt;P&gt;Azure Policy:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;checks assignments for the cluster&lt;/LI&gt;
&lt;LI&gt;deploys policy definitions into the cluster as Gatekeeper resources&lt;/LI&gt;
&lt;LI&gt;reports audit/compliance results to Azure Policy service&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Call‑out: Why this prevents incidents&lt;/STRONG&gt;&lt;BR /&gt;CI scans can be skipped. &lt;STRONG&gt;Admission control cannot be skipped&lt;/STRONG&gt; (unless the cluster is misconfigured). Azure Policy for AKS enforces rules even when workloads are deployed outside your pipeline.&lt;/P&gt;
&lt;H3&gt;What to enforce first (high-impact controls)&lt;/H3&gt;
&lt;P&gt;The AKS baseline architecture highlights policy management as a key tool and explicitly calls out governance for container images and security validation. &lt;BR /&gt;Start with gates that stop the most common blast-radius problems: &lt;A href="https://learn.microsoft.com/en-gb/answers/questions/1368272/vnet-peering" target="_blank"&gt;[Vnet peeri...rosoft Q&amp;amp;A | Learn.Microsoft.com]&lt;/A&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;image governance&lt;/STRONG&gt; (trusted registries / approved images) &lt;A href="https://learn.microsoft.com/en-gb/answers/questions/1368272/vnet-peering" target="_blank"&gt;[Vnet peeri...rosoft Q&amp;amp;A | Learn.Microsoft.com]&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;pod security baseline&lt;/STRONG&gt; (privilege escalation controls)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;public exposure controls&lt;/STRONG&gt; (restrict risky patterns) &lt;A href="https://tech.hub.ms/azure" target="_blank"&gt;[tech.hub.ms]&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Gate 2 — Pod Security Standards (Baseline → Restricted) via Azure Policy&lt;/H2&gt;
&lt;P&gt;Azure Policy can apply built‑in Kubernetes initiatives such as &lt;STRONG&gt;pod security baseline standards&lt;/STRONG&gt; and you can set the effect from &lt;STRONG&gt;audit&lt;/STRONG&gt; to &lt;STRONG&gt;deny&lt;/STRONG&gt; to block violations.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Call‑out: Practical rollout strategy&lt;/STRONG&gt;&lt;BR /&gt;Start in &lt;STRONG&gt;Audit&lt;/STRONG&gt;, fix violations, then move to &lt;STRONG&gt;Deny&lt;/STRONG&gt; for production namespaces. Azure Policy supports staged enforcement and reporting, making rollout safer.&lt;/P&gt;
&lt;H2&gt;Gate 3 — Network policy enforcement (stop lateral movement)&lt;/H2&gt;
&lt;P&gt;The AKS DevSecOps guidance recommends securing and governing clusters with Azure Policy and using &lt;STRONG&gt;network policies&lt;/STRONG&gt; to control pod-to-pod flows. &lt;BR /&gt;The AKS baseline architecture also centers on securing in‑cluster traffic and aligning networking/security/identity teams around a consistent baseline. &lt;A href="https://tech.hub.ms/azure" target="_blank"&gt;[tech.hub.ms]&lt;/A&gt; &lt;A href="https://learn.microsoft.com/en-gb/answers/questions/1368272/vnet-peering" target="_blank"&gt;[Vnet peeri...rosoft Q&amp;amp;A | Learn.Microsoft.com]&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Call‑out: Incident prevention lens&lt;/STRONG&gt;&lt;BR /&gt;Network isolation gates reduce “one compromised pod → entire cluster compromised” scenarios by limiting east‑west connectivity. &lt;A href="https://tech.hub.ms/azure" target="_blank"&gt;[tech.hub.ms]&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;Gate 4 — Supply chain guardrails (image + deployment governance)&lt;/H2&gt;
&lt;P&gt;The AKS baseline architecture specifically highlights container images as a frequent vulnerability source and recommends governance using Azure Policy + Gatekeeper to ensure only approved images are deployed. &lt;A href="https://learn.microsoft.com/en-gb/answers/questions/1368272/vnet-peering" target="_blank"&gt;[Vnet peeri...rosoft Q&amp;amp;A | Learn.Microsoft.com]&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;This is where many “quiet” incidents start: images pulled from unknown registries, unsigned builds, or non-standard tags. A governance gate stops that at admission time. &lt;A href="https://learn.microsoft.com/en-gb/answers/questions/1368272/vnet-peering" target="_blank"&gt;[Vnet peeri...rosoft Q&amp;amp;A | Learn.Microsoft.com]&lt;/A&gt;&lt;/P&gt;
&lt;H1&gt;Runtime safety net (because prevention isn’t perfect)&lt;/H1&gt;
&lt;H2&gt;Defender for Containers on AKS (runtime detection + posture)&lt;/H2&gt;
&lt;P&gt;Microsoft Defender for Containers provides runtime security monitoring and ongoing security operations workflows. The enablement guidance highlights:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;enabling protection broadly or selectively&lt;/LI&gt;
&lt;LI&gt;network/egress requirements for the Defender sensor&lt;/LI&gt;
&lt;LI&gt;ongoing operations (review vulnerabilities, recommendations, investigate alerts) &lt;A href="https://dev.azure.com/unifiedactiontracker/124a6df1-e031-4816-9da2-a215a27222c9/_workitems/edit/537261" target="_blank"&gt;[Support fo...t peering) | ADO Work Item (UAT)]&lt;/A&gt;, &lt;A href="https://techcommunity.microsoft.com/blog/azureinfrastructureblog/deep-dive-into-the-maia-200-architecture/4489312" target="_blank"&gt;[techcommun...rosoft.com]&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Call‑out: Don’t skip egress planning&lt;/STRONG&gt;&lt;BR /&gt;For restricted egress clusters, Defender requires outbound access to specific endpoints/FQDNs to send security data and events. &lt;A href="https://dev.azure.com/unifiedactiontracker/124a6df1-e031-4816-9da2-a215a27222c9/_workitems/edit/537261" target="_blank"&gt;[Support fo...t peering) | ADO Work Item (UAT)]&lt;/A&gt;, &lt;A href="https://techcommunity.microsoft.com/blog/azureinfrastructureblog/deep-dive-into-the-maia-200-architecture/4489312" target="_blank"&gt;[techcommun...rosoft.com]&lt;/A&gt;&lt;/P&gt;
&lt;H3&gt;Operational knobs you’ll actually use&lt;/H3&gt;
&lt;P&gt;Defender configuration supports enabling/disabling components like:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;agentless discovery&lt;/LI&gt;
&lt;LI&gt;vulnerability assessment&lt;/LI&gt;
&lt;LI&gt;Defender DaemonSet (sensor)&lt;/LI&gt;
&lt;LI&gt;Azure Policy for Kubernetes (integration point) &lt;A href="https://digitalthoughtdisruption.com/2025/07/08/azure-arc-network-management-hybrid/" target="_blank"&gt;[digitaltho...uption.com]&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;And provides CLI examples for enabling Defender and adding the Azure Policy add‑on (useful for repeatable automation). &lt;A href="https://digitalthoughtdisruption.com/2025/07/08/azure-arc-network-management-hybrid/" target="_blank"&gt;[digitaltho...uption.com]&lt;/A&gt;&lt;/P&gt;
&lt;H1&gt;Continuous compliance (drift is inevitable)&lt;/H1&gt;
&lt;P&gt;Azure Policy for Kubernetes is designed to report compliance state back to Azure and centralize governance. That’s what helps you detect drift and keep controls enforced across clusters over time.&lt;/P&gt;
&lt;H2&gt;Mapping “common incident patterns” to gates (actionable cheat sheet)&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Incident pattern you want to avoid&lt;/th&gt;&lt;th&gt;Gate that prevents it&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Privileged containers / risky pod specs&lt;/td&gt;&lt;td&gt;Pod security standards enforced via Azure Policy (Audit → Deny)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Untrusted images running in prod&lt;/td&gt;&lt;td&gt;Image governance enforced by Azure Policy + Gatekeeper &lt;A href="https://learn.microsoft.com/en-gb/answers/questions/1368272/vnet-peering" target="_blank"&gt;[Vnet peeri...rosoft Q&amp;amp;A | Learn.Microsoft.com]&lt;/A&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Flat east‑west network → lateral movement&lt;/td&gt;&lt;td&gt;Network policy guidance (DevSecOps on AKS + baseline) &lt;A href="https://tech.hub.ms/azure" target="_blank"&gt;[tech.hub.ms]&lt;/A&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Threat activity at runtime (post‑deploy)&lt;/td&gt;&lt;td&gt;Defender for Containers runtime protection + alerts &lt;A href="https://dev.azure.com/unifiedactiontracker/124a6df1-e031-4816-9da2-a215a27222c9/_workitems/edit/537261" target="_blank"&gt;[Support fo...t peering) | ADO Work Item (UAT)]&lt;/A&gt;, &lt;A href="https://techcommunity.microsoft.com/blog/azureinfrastructureblog/deep-dive-into-the-maia-200-architecture/4489312" target="_blank"&gt;[techcommun...rosoft.com]&lt;/A&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Silent drift &amp;amp; inconsistent posture across clusters&lt;/td&gt;&lt;td&gt;Azure Policy compliance reporting for Kubernetes&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 03 Apr 2026 10:46:12 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/devsecops-on-aks-governance-gates-that-actually-prevent/ba-p/4508415</guid>
      <dc:creator>lakshaymalik</dc:creator>
      <dc:date>2026-04-03T10:46:12Z</dc:date>
    </item>
    <item>
      <title>Azure SQL Managed Instance as an AI‑Enabled PaaS Platform</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/azure-sql-managed-instance-as-an-ai-enabled-paas-platform/ba-p/4508380</link>
      <description>&lt;H2&gt;AI Capabilities Built into Azure SQL Managed Instance&lt;/H2&gt;
&lt;P&gt;Azure SQL MI includes multiple intelligence layers by default:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Intelligent Insights&lt;/STRONG&gt; for anomaly detection&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Automatic tuning (recommend mode)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Copilot‑assisted diagnostics&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Native vector data types&lt;/STRONG&gt; for AI workloads&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;These capabilities work together without requiring external services or agents.&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;Why Azure SQL MI Is a Natural Fit for AI Workloads&lt;/H2&gt;
&lt;P&gt;Azure SQL MI already sits at the center of many mission‑critical platforms:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Fully managed SQL Server–compatible PaaS&lt;/LI&gt;
&lt;LI&gt;Private networking with VNet isolation&lt;/LI&gt;
&lt;LI&gt;Native HA/DR, automated patching, and backups&lt;/LI&gt;
&lt;LI&gt;Enterprise governance, compliance, and security&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;What makes it ideal for AI adoption is &lt;STRONG&gt;proximity&lt;/STRONG&gt;—your data, metadata, performance history, and operational context are already there.&lt;/P&gt;
&lt;P&gt;AI works best &lt;STRONG&gt;where data does not need to move&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Built‑In AI for Operations: Intelligent Insights&lt;/H2&gt;
&lt;P&gt;Intelligent Insights continuously analyzes workload behavior and:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Detects blocking patterns&lt;/LI&gt;
&lt;LI&gt;Identifies query plan regressions&lt;/LI&gt;
&lt;LI&gt;Flags unusual performance deviations&lt;/LI&gt;
&lt;LI&gt;Compares current behavior to historical baselines&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Instead of manually searching for issues, DBAs receive &lt;STRONG&gt;actionable signals early&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Native Vector Support: Running AI Workloads on SQL MI&lt;/H2&gt;
&lt;P&gt;Azure SQL Managed Instance now supports &lt;STRONG&gt;native vector data types&lt;/STRONG&gt;, enabling AI scenarios directly within the database boundary.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Example: Vector Search Query&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;SELECT *
FROM Products
ORDER BY VECTOR_DISTANCE(embedding, @queryEmbedding);&lt;/LI-CODE&gt;
&lt;P&gt;This enables:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Semantic search&lt;/LI&gt;
&lt;LI&gt;Retrieval‑augmented generation (RAG)&lt;/LI&gt;
&lt;LI&gt;AI‑powered recommendations&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;In‑Database Machine Learning with Python and R&lt;/H2&gt;
&lt;P&gt;Azure SQL Managed Instance includes &lt;STRONG&gt;Machine Learning Services&lt;/STRONG&gt;, allowing you to run Python and R scripts &lt;EM&gt;inside&lt;/EM&gt; the database engine itself.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This enables:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Data preparation and feature engineering in‑place&lt;/LI&gt;
&lt;LI&gt;Model training directly against full relational datasets&lt;/LI&gt;
&lt;LI&gt;Real‑time scoring using stored procedures or the native PREDICT() function&lt;/LI&gt;
&lt;LI&gt;Use of open‑source libraries such as scikit‑learn, TensorFlow, and PyTorch&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Why this matters for Infra and DBAs:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No data exfiltration to external services&lt;/LI&gt;
&lt;LI&gt;Lower latency and reduced ETL pipelines&lt;/LI&gt;
&lt;LI&gt;Security boundaries remain intact&lt;/LI&gt;
&lt;LI&gt;Models become part of the database deployment lifecycle&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This shifts ML from being &lt;EM&gt;adjacent&lt;/EM&gt; to the platform to being &lt;STRONG&gt;embedded within it&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Copilot in Azure SQL: AI‑Assisted Operations&lt;/H2&gt;
&lt;P&gt;Microsoft Copilot is integrated with Azure SQL to provide&amp;nbsp;&lt;STRONG&gt;context‑aware operational insights&lt;/STRONG&gt; using Query Store, DMVs, and platform telemetry.&lt;/P&gt;
&lt;P&gt;Instead of manually inspecting metrics, teams can ask Copilot direct questions.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Example – Performance Investigation&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;Why did query performance degrade on this database in the last 24 hours?&lt;/LI-CODE&gt;
&lt;P&gt;Copilot leverages:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Dynamic Management Views (DMVs)&lt;/LI&gt;
&lt;LI&gt;Query Store data&lt;/LI&gt;
&lt;LI&gt;Azure diagnostics&lt;/LI&gt;
&lt;LI&gt;SQL metadata and schema context&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Copilot in SSMS: Natural Language Meets T‑SQL&lt;/H2&gt;
&lt;P&gt;Copilot is also available in &lt;STRONG&gt;SQL Server Management Studio (SSMS)&lt;/STRONG&gt;, supporting Azure SQL Managed Instance connections.&lt;/P&gt;
&lt;P&gt;Capabilities include:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Natural language → T‑SQL query generation&lt;/LI&gt;
&lt;LI&gt;Query explanation and optimization suggestions&lt;/LI&gt;
&lt;LI&gt;Schema‑aware code assistance&lt;/LI&gt;
&lt;LI&gt;Faster troubleshooting of legacy queries&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Crucially, Copilot &lt;STRONG&gt;respects permissions&lt;/STRONG&gt;—it cannot access tables or data that your login cannot see.&lt;/P&gt;
&lt;P&gt;Example – Query Generation&lt;/P&gt;
&lt;LI-CODE lang=""&gt;Show top 10 customers by total order value in the last 30 days.&lt;/LI-CODE&gt;
&lt;P&gt;Generated SQL (example):&lt;/P&gt;
&lt;LI-CODE lang=""&gt;SELECT TOP 10 CustomerId, SUM(OrderAmount) AS TotalOrderValue
FROM Orders
WHERE OrderDate &amp;gt;= DATEADD(DAY, -30, GETUTCDATE())
GROUP BY CustomerId
ORDER BY TotalOrderValue DESC;&lt;/LI-CODE&gt;
&lt;P&gt;This makes it safe for production environments while accelerating both DBA and developer workflows.&lt;/P&gt;
&lt;H2&gt;Azure SQL MI as a Knowledge Source for AI Agents&lt;/H2&gt;
&lt;P&gt;Azure SQL can now act as a &lt;STRONG&gt;knowledge source for Copilot Studio agents&lt;/STRONG&gt;, enabling conversational access to enterprise data powered by large language models.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With this approach:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure SQL MI provides structured truth&lt;/LI&gt;
&lt;LI&gt;Copilot Studio provides conversational intelligence&lt;/LI&gt;
&lt;LI&gt;The database becomes queryable via natural language APIs&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This unlocks scenarios like:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Operational dashboards backed by live SQL data&lt;/LI&gt;
&lt;LI&gt;AI‑powered support assistants querying ticket or telemetry tables&lt;/LI&gt;
&lt;LI&gt;Governance‑controlled enterprise chatbots grounded in SQL data&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Example – Copilot Studio Prompt&lt;/P&gt;
&lt;LI-CODE lang=""&gt;What were the top database performance issues last week?&lt;/LI-CODE&gt;
&lt;P&gt;Behind the scenes, Copilot queries Azure SQL MI, processes results via Azure OpenAI, and returns a response grounded in real data.&lt;/P&gt;
&lt;H2&gt;Operational Intelligence: AI for Platform Management&lt;/H2&gt;
&lt;P&gt;Beyond queries and data science, AI in Azure SQL MI improves &lt;STRONG&gt;platform operations&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Performance insights built on historical Query Store data&lt;/LI&gt;
&lt;LI&gt;Intelligent recommendations surfaced via Azure Monitor and Copilot&lt;/LI&gt;
&lt;LI&gt;Reduced dependency on manual runbooks during incidents&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Instead of reacting to alerts in isolation, teams can &lt;STRONG&gt;ask the platform why something happened&lt;/STRONG&gt;—and receive contextual answers grounded in real telemetry.&lt;/P&gt;
&lt;H2&gt;Security, Privacy, and Responsible AI&lt;/H2&gt;
&lt;P&gt;Microsoft emphasizes &lt;STRONG&gt;responsible AI boundaries&lt;/STRONG&gt; across Azure SQL integrations:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Prompts and responses are not used to train foundation models&lt;/LI&gt;
&lt;LI&gt;Data remains tenant‑isolated&lt;/LI&gt;
&lt;LI&gt;Access controls and RBAC are always enforced&lt;/LI&gt;
&lt;LI&gt;Azure OpenAI principles apply to Copilot integrations&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This allows enterprises to adopt AI &lt;STRONG&gt;without compromising compliance or data governance&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;When Azure SQL Managed Instance Makes Sense for AI Adoption&lt;/H2&gt;
&lt;P&gt;Azure SQL MI is a strong fit when:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Enterprise security and compliance are mandatory&lt;/LI&gt;
&lt;LI&gt;Existing SQL estates already exist&lt;/LI&gt;
&lt;LI&gt;AI adoption must be platform‑led&lt;/LI&gt;
&lt;LI&gt;Operational safety is a priority&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Final Thoughts: SQL MI Is No Longer “Just a Database”&lt;/H2&gt;
&lt;P&gt;Azure SQL Managed Instance is transitioning from:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Migration target → Intelligent platform&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;For infrastructure and platform teams, this means:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Fewer external dependencies for analytics&lt;/LI&gt;
&lt;LI&gt;AI assistance embedded into daily operations&lt;/LI&gt;
&lt;LI&gt;Data‑centric AI architectures with clear ownership boundaries&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;As AI adoption accelerates, platforms that already combine &lt;STRONG&gt;data, security, and operations&lt;/STRONG&gt; will lead the way. Azure SQL MI sits firmly in that category.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Apr 2026 09:42:50 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/azure-sql-managed-instance-as-an-ai-enabled-paas-platform/ba-p/4508380</guid>
      <dc:creator>ShivaniThadiyan</dc:creator>
      <dc:date>2026-04-03T09:42:50Z</dc:date>
    </item>
    <item>
      <title>AI‑Assisted Azure Infrastructure Validation and Drift Detection</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/ai-assisted-azure-infrastructure-validation-and-drift-detection/ba-p/4508346</link>
      <description>&lt;H2&gt;Why Traditional Drift Detection Isn’t Enough&lt;/H2&gt;
&lt;P&gt;Most teams already rely on:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Terraform plan reviews&lt;/LI&gt;
&lt;LI&gt;Azure Policy compliance dashboards&lt;/LI&gt;
&lt;LI&gt;Azure Resource Graph queries&lt;/LI&gt;
&lt;LI&gt;Manual scripts and audits&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The problem isn’t missing data—it’s &lt;STRONG&gt;interpretation at scale&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Validation outputs are:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Verbose and noisy&lt;/LI&gt;
&lt;LI&gt;Spread across multiple tools&lt;/LI&gt;
&lt;LI&gt;Difficult to prioritize&lt;/LI&gt;
&lt;LI&gt;Dependent on human context&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This is where &lt;STRONG&gt;AI as an assistive layer&lt;/STRONG&gt; adds value.&lt;/P&gt;
&lt;H2&gt;Where AI Fits (And Where It Does Not)&lt;/H2&gt;
&lt;P&gt;AI should &lt;STRONG&gt;not&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Auto‑approve infrastructure changes&lt;/LI&gt;
&lt;LI&gt;Apply remediation directly&lt;/LI&gt;
&lt;LI&gt;Replace Terraform, Policy, or RBAC&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;AI &lt;STRONG&gt;should&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Summarize large outputs&lt;/LI&gt;
&lt;LI&gt;Highlight risky or unexpected changes&lt;/LI&gt;
&lt;LI&gt;Detect drift patterns&lt;/LI&gt;
&lt;LI&gt;Assist human decision‑making&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The goal is &lt;STRONG&gt;decision support&lt;/STRONG&gt;, not autonomous enforcement.&lt;/P&gt;
&lt;H2&gt;Shift‑Left Terraform: Catch Issues Early&lt;/H2&gt;
&lt;P&gt;AI‑assisted validation works best when combined with &lt;STRONG&gt;shift‑left practices&lt;/STRONG&gt;—detecting problems &lt;EM&gt;before&lt;/EM&gt; infrastructure is deployed.&lt;/P&gt;
&lt;P&gt;Shift‑left moves failure detection:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;From production → pipelines&lt;/LI&gt;
&lt;LI&gt;From pipelines → pull requests&lt;/LI&gt;
&lt;LI&gt;From pull requests → developer machines&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Step‑by‑Step: Shift‑Left Terraform Lifecycle&lt;/H2&gt;
&lt;LI-CODE lang=""&gt;Code Commit
   ↓
Local Validation
   ↓
Static Analysis &amp;amp; Security
   ↓
Terraform Plan Review
   ↓
Drift Gate
   ↓
Approval
   ↓
Apply&lt;/LI-CODE&gt;
&lt;H2&gt;Step 1: Local Terraform Validation&lt;/H2&gt;
&lt;P&gt;Start at the developer workstation.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;terraform init
terraform validate&lt;/LI-CODE&gt;
&lt;H2&gt;Step 2: PR‑Level Static Validation&lt;/H2&gt;
&lt;P&gt;Run automated checks on pull requests:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;terraform fmt&lt;/LI&gt;
&lt;LI&gt;Linting (TFLint)&lt;/LI&gt;
&lt;LI&gt;IaC security scanning (tfsec, Checkov, etc.)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This enforces standards &lt;EM&gt;before&lt;/EM&gt; merge—and reduces review friction.&lt;/P&gt;
&lt;H2&gt;Step 3: Generate a Deterministic Terraform Plan&lt;/H2&gt;
&lt;P&gt;Separate &lt;STRONG&gt;planning&lt;/STRONG&gt; from &lt;STRONG&gt;execution&lt;/STRONG&gt;.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;terraform plan -out=tfplan&lt;/LI-CODE&gt;
&lt;P&gt;This gives full visibility with &lt;STRONG&gt;zero impact&lt;/STRONG&gt; to Azure.&lt;/P&gt;
&lt;H2&gt;Step 4: AI‑Assisted Terraform Plan Review&lt;/H2&gt;
&lt;P&gt;Large Terraform plans are accurate—but exhausting to review.&lt;/P&gt;
&lt;P&gt;GitHub Copilot can summarize the impact.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Example Copilot prompt:&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;Summarize this Terraform plan:
1) Security, network, or identity-impacting changes
2) Potential downtime risks
3) Unexpected changes outside standard modules
Provide a concise approval-ready summary.
&lt;/LI-CODE&gt;
&lt;H2&gt;Step 5: Drift‑Only Detection Gate (Critical Shift‑Left Control)&lt;/H2&gt;
&lt;P&gt;Before applying changes, confirm Terraform state still matches Azure.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;terraform plan -refresh-only -detailed-exitcode&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Exit codes:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;0 → No drift&lt;/LI&gt;
&lt;LI&gt;2 → Drift detected&lt;/LI&gt;
&lt;LI&gt;1 → Error&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This gate catches:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Manual Portal edits&lt;/LI&gt;
&lt;LI&gt;Emergency fixes not back‑ported to IaC&lt;/LI&gt;
&lt;LI&gt;External automation interference&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Step 6: Human Approval (Governance Intact)&lt;/H2&gt;
&lt;P&gt;Shift‑left doesn’t remove humans.&lt;/P&gt;
&lt;P&gt;Approvals validate:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Terraform plan&lt;/LI&gt;
&lt;LI&gt;Drift results&lt;/LI&gt;
&lt;LI&gt;AI summaries&lt;/LI&gt;
&lt;LI&gt;Policy implications&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This keeps &lt;STRONG&gt;governance strong without slowing delivery&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Step 7: Apply Exactly What Was Reviewed&lt;/H2&gt;
&lt;LI-CODE lang=""&gt;terraform apply tfplan
&lt;/LI-CODE&gt;
&lt;P&gt;No re‑calculation.&lt;BR /&gt;No surprises.&lt;BR /&gt;No uncontrolled changes.&lt;/P&gt;
&lt;H2&gt;Azure Resource Graph: Drift in the Real World&lt;/H2&gt;
&lt;P&gt;Terraform shows &lt;EM&gt;intended state&lt;/EM&gt;.&lt;BR /&gt;Azure Resource Graph shows &lt;EM&gt;actual state at scale&lt;/EM&gt;.&lt;/P&gt;
&lt;H3&gt;Who Changed What? (Change Analysis)&lt;/H3&gt;
&lt;LI-CODE lang=""&gt;resourcechanges
| extend changeTime = todatetime(properties.changeAttributes.timestamp)
| extend targetResourceId = tostring(properties.targetResourceId)
| extend changeType = tostring(properties.changeType)
| extend changedBy = tostring(properties.changeAttributes.changedBy)
| extend clientType = tostring(properties.changeAttributes.clientType)
| extend operation = tostring(properties.changeAttributes.operation)
| where changeTime &amp;gt; ago(7d)
| project changeTime, targetResourceId, changeType, changedBy, clientType, operation
| sort by changeTime desc&lt;/LI-CODE&gt;
&lt;P&gt;This reveals:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Portal vs automation changes&lt;/LI&gt;
&lt;LI&gt;Actor identity&lt;/LI&gt;
&lt;LI&gt;Operation type&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;AI can then &lt;STRONG&gt;flag suspicious patterns&lt;/STRONG&gt; instead of manual scanning.&lt;/P&gt;
&lt;H3&gt;Detecting Tag Drift&lt;/H3&gt;
&lt;LI-CODE lang=""&gt;ResourceContainers
| where type =~ 'microsoft.resources/subscriptions/resourcegroups'
| where isnull(tags['Owner']) or isempty(tostring(tags['Owner']))
| project subscriptionId, resourceGroup=name, location, tags&lt;/LI-CODE&gt;
&lt;P&gt;Tag drift is often the&amp;nbsp;&lt;STRONG&gt;earliest sign of governance decay&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Azure Policy: From Compliance to Action&lt;/H2&gt;
&lt;P&gt;Azure Policy tells you what’s non‑compliant—but not what to fix first.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;PolicyResources
| where type =~ 'Microsoft.PolicyInsights/PolicyStates'
| extend complianceState = tostring(properties.complianceState)
| extend policyAssignmentName = tostring(properties.policyAssignmentName)
| summarize count() by policyAssignmentName, complianceState&lt;/LI-CODE&gt;
&lt;P&gt;AI helps here by&amp;nbsp;&lt;STRONG&gt;grouping violations, ranking risk, and suggesting remediation paths&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;A Reusable Azure Infrastructure Prompt Library&lt;/H2&gt;
&lt;P&gt;Instead of ad‑hoc prompting, teams can standardize infra‑specific Copilot prompts.&lt;/P&gt;
&lt;H3&gt;Terraform Plan Review&lt;/H3&gt;
&lt;LI-CODE lang=""&gt;Summarize this Terraform plan:
- High-risk changes
- Downtime risks
- Unexpected modifications
&lt;/LI-CODE&gt;
&lt;H3&gt;Drift Interpretation&lt;/H3&gt;
&lt;LI-CODE lang=""&gt;Analyze this terraform plan -refresh-only output.
Explain drift cause and recommend revert, backport, or accept.&lt;/LI-CODE&gt;
&lt;H3&gt;Resource Graph Drift Triage&lt;/H3&gt;
&lt;LI-CODE lang=""&gt;Group these Azure resource changes by actor and clientType.
Highlight suspicious patterns and suggest guardrails.&lt;/LI-CODE&gt;
&lt;H3&gt;Policy Compliance Prioritization&lt;/H3&gt;
&lt;LI-CODE lang=""&gt;Group policy violations by root cause.
Rank by risk and suggest remediation approaches.&lt;/LI-CODE&gt;
&lt;H2&gt;Key Takeaways&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Drift is inevitable; unmanaged drift is optional&lt;/LI&gt;
&lt;LI&gt;Shift‑left Terraform reduces risk &lt;EM&gt;before&lt;/EM&gt; Azure is touched&lt;/LI&gt;
&lt;LI&gt;AI excels at analysis, not enforcement&lt;/LI&gt;
&lt;LI&gt;Terraform, KQL, Policy, and AI work best &lt;STRONG&gt;together&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Governance becomes clearer—not weaker&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;AI doesn’t replace infrastructure engineers. It helps them think faster and safer—earlier.&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Apr 2026 08:48:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/ai-assisted-azure-infrastructure-validation-and-drift-detection/ba-p/4508346</guid>
      <dc:creator>ShivaniThadiyan</dc:creator>
      <dc:date>2026-04-03T08:48:00Z</dc:date>
    </item>
    <item>
      <title>🚀 General Availability of Private Application Gateway on Azure Application Gateway v2</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/general-availability-of-private-application-gateway-on-azure/ba-p/4508294</link>
      <description>&lt;H2&gt;🔍 What Is Private Application Gateway?&lt;/H2&gt;
&lt;P&gt;Historically, &lt;STRONG&gt;Application Gateway v2 required a public IP address&lt;/STRONG&gt; to communicate with the Azure control plane (GatewayManager). This requirement imposed several constraints:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Mandatory public IP exposure&lt;/LI&gt;
&lt;LI&gt;Restricted Network Security Group (NSG) rules&lt;/LI&gt;
&lt;LI&gt;Limited route table flexibility&lt;/LI&gt;
&lt;LI&gt;No support for forced tunneling&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Private Application Gateway removes these limitations&lt;/STRONG&gt; by introducing &lt;STRONG&gt;Application Gateway Network Isolation&lt;/STRONG&gt;, enabling:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Private IP‑only frontend&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;No public IP requirement&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Full NSG and route table control&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Forced tunneling support&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Controlled outbound connectivity&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;All management and data traffic remains on the &lt;STRONG&gt;Azure backbone network&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;✅ Key Capabilities (Now Generally Available)&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Capability&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th&gt;&lt;STRONG&gt;Private IP‑only frontend&lt;/STRONG&gt;&lt;/th&gt;&lt;td&gt;Application Gateway can be deployed without any public IP&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th&gt;&lt;STRONG&gt;Network Isolation&lt;/STRONG&gt;&lt;/th&gt;&lt;td&gt;Removes dependency on GatewayManager service tag&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th&gt;&lt;STRONG&gt;Custom NSG rules&lt;/STRONG&gt;&lt;/th&gt;&lt;td&gt;Full control of inbound and outbound rules&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th&gt;&lt;STRONG&gt;Deny All outbound support&lt;/STRONG&gt;&lt;/th&gt;&lt;td&gt;Prevent unintended internet egress&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th&gt;&lt;STRONG&gt;Route table flexibility&lt;/STRONG&gt;&lt;/th&gt;&lt;td&gt;Support for 0.0.0.0/0 to virtual appliances&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th&gt;&lt;STRONG&gt;Forced tunneling&lt;/STRONG&gt;&lt;/th&gt;&lt;td&gt;Works with on‑premises or hub firewalls&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;🧩 Architecture Overview&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Private Application Gateway Architecture&lt;/STRONG&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;✅ No public IP&lt;BR /&gt;✅ No internet dependency&lt;BR /&gt;✅ Fully private traffic flow&lt;/P&gt;
&lt;H2&gt;🛠️ How to Enable Application Gateway Network Isolation&lt;/H2&gt;
&lt;P&gt;&lt;EM&gt;(Required for Private Application Gateway)&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;The &lt;STRONG&gt;Network Isolation feature&lt;/STRONG&gt; must be enabled at deployment time.&lt;/P&gt;
&lt;H3&gt;✅ Option 1: Azure Portal (Recommended)&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;Go to &lt;STRONG&gt;Create Application Gateway&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Select &lt;STRONG&gt;SKU: Standard_v2 or WAF_v2&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;During &lt;STRONG&gt;Advanced configuration&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Enable &lt;STRONG&gt;Network isolation&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Configure:
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Private frontend IP only&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;No public IP&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Deploy the gateway&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Once enabled, the gateway no longer requires inbound GatewayManager access or unrestricted outbound internet access.&lt;/P&gt;
&lt;H3&gt;✅ Option 2: Azure CLI / PowerShell / ARM&lt;/H3&gt;
&lt;P&gt;When deploying via automation:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Enable the &lt;STRONG&gt;private deployment / network isolation&lt;/STRONG&gt; capability during creation&lt;/LI&gt;
&lt;LI&gt;Apply:
&lt;UL&gt;
&lt;LI&gt;Custom NSG rules&lt;/LI&gt;
&lt;LI&gt;Custom route tables&lt;/LI&gt;
&lt;LI&gt;Private DNS resolution&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Existing gateways &lt;STRONG&gt;cannot be retrofitted&lt;/STRONG&gt;—network isolation must be enabled at creation time.&lt;/P&gt;
&lt;P&gt;📘 Reference:&lt;BR /&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-private-deployment" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-private-deployment&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;🔐 Recommended NSG &amp;amp; Routing Model&lt;/H2&gt;
&lt;H3&gt;NSG&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Allow &lt;STRONG&gt;only required inbound ports&lt;/STRONG&gt; (for application traffic)&lt;/LI&gt;
&lt;LI&gt;Explicit outbound allow rules for:
&lt;UL&gt;
&lt;LI&gt;Azure Monitor&lt;/LI&gt;
&lt;LI&gt;Key Vault (if used)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Final rule: &lt;STRONG&gt;Deny All outbound&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Route Table&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;0.0.0.0/0 → Virtual Appliance (Firewall / NVA)&lt;/LI&gt;
&lt;LI&gt;Supports forced tunneling and traffic inspection&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;🌍 Real‑World Scenarios&lt;/H2&gt;
&lt;H3&gt;✅ Scenario 1: Financial Services – Regulatory Compliance&lt;/H3&gt;
&lt;P&gt;Banks deploy Application Gateway privately behind a hub firewall, ensuring:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No public IP exposure&lt;/LI&gt;
&lt;LI&gt;All traffic inspected&lt;/LI&gt;
&lt;LI&gt;Full audit control&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;✅ Scenario 2: Enterprise Landing Zones&lt;/H3&gt;
&lt;P&gt;Platform teams deploy &lt;STRONG&gt;standardized, policy‑compliant gateways&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Policy blocks public IP creation&lt;/LI&gt;
&lt;LI&gt;Private Application Gateway fully supported&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;✅ Scenario 3: Hybrid Connectivity with Forced Tunneling&lt;/H3&gt;
&lt;P&gt;Traffic from Application Gateway flows through:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Firewall&lt;/LI&gt;
&lt;LI&gt;On‑premises inspection devices&lt;/LI&gt;
&lt;LI&gt;Central logging systems&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;✅ Scenario 4: Internal Line‑of‑Business Apps&lt;/H3&gt;
&lt;P&gt;HR, Finance, and internal portals:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Accessible only from corporate networks&lt;/LI&gt;
&lt;LI&gt;No internet attack surface&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;⚠️ Important Considerations&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Network Isolation &lt;STRONG&gt;must be enabled at creation&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Requires &lt;STRONG&gt;Standard_v2 or WAF_v2&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Private DNS planning is critical&lt;/LI&gt;
&lt;LI&gt;Monitoring endpoints must be explicitly allowed&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;📌 When Should You Use Private Application Gateway?&lt;/H2&gt;
&lt;P&gt;✅ You want &lt;STRONG&gt;zero public exposure&lt;/STRONG&gt;&lt;BR /&gt;✅ You require &lt;STRONG&gt;forced tunneling&lt;/STRONG&gt;&lt;BR /&gt;✅ You enforce &lt;STRONG&gt;Deny All outbound&lt;/STRONG&gt;&lt;BR /&gt;✅ You operate in &lt;STRONG&gt;regulated environments&lt;/STRONG&gt;&lt;BR /&gt;✅ You follow &lt;STRONG&gt;Enterprise Landing Zone patterns&lt;/STRONG&gt;&lt;/P&gt;
&lt;H2&gt;🎯 Final Thoughts&lt;/H2&gt;
&lt;P&gt;Private Application Gateway fundamentally changes how Application Gateway fits into &lt;STRONG&gt;secure Azure architectures&lt;/STRONG&gt;. With Network Isolation now generally available, customers can finally deploy Application Gateway in &lt;STRONG&gt;fully private, firewall‑controlled, enterprise‑grade environments&lt;/STRONG&gt;—without workarounds.&lt;/P&gt;
&lt;P&gt;This feature unlocks new design patterns for:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Landing Zones&lt;/LI&gt;
&lt;LI&gt;Hub‑and‑spoke networks&lt;/LI&gt;
&lt;LI&gt;Regulated workloads&lt;/LI&gt;
&lt;LI&gt;Hybrid connectivity&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;🔗 Learn More&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-private-deployment" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/application-gateway/application-gateway-private-deployment&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/application-gateway/overview" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/application-gateway/overview&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 03 Apr 2026 07:24:09 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/general-availability-of-private-application-gateway-on-azure/ba-p/4508294</guid>
      <dc:creator>kumaramit1</dc:creator>
      <dc:date>2026-04-03T07:24:09Z</dc:date>
    </item>
    <item>
      <title>Enterprise‑Scale Azure Subscription Vending Using Azure Verified Modules (AVM)</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/enterprise-scale-azure-subscription-vending-using-azure-verified/ba-p/4507751</link>
      <description>&lt;H2&gt;Why Subscription Vending Is Critical at Scale&lt;/H2&gt;
&lt;P&gt;Azure subscriptions define the &lt;STRONG&gt;security, governance, and billing boundary&lt;/STRONG&gt; for workloads. In large organizations, manual subscription creation often leads to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Inconsistent management group placement&lt;/LI&gt;
&lt;LI&gt;Delayed or missing policy enforcement&lt;/LI&gt;
&lt;LI&gt;Incorrect RBAC assignments&lt;/LI&gt;
&lt;LI&gt;Lack of cost controls&lt;/LI&gt;
&lt;LI&gt;Platform teams becoming a bottleneck&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Subscription vending&lt;/STRONG&gt; standardizes this process by allowing application teams to request subscriptions while platform teams enforce governance through automation. Microsoft formally recommends this approach as part of Azure Landing Zones.&lt;/P&gt;
&lt;H2&gt;Azure Verified Modules (AVM) – The Foundation&lt;/H2&gt;
&lt;P&gt;Azure Verified Modules (AVM) are &lt;STRONG&gt;Microsoft‑owned and Microsoft‑supported&lt;/STRONG&gt; Infrastructure as Code modules that codify Azure Well‑Architected Framework guidance.&lt;/P&gt;
&lt;P&gt;AVM provides:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Resource modules&lt;/STRONG&gt; – deploy individual Azure resources&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Pattern modules&lt;/STRONG&gt; – deploy opinionated architectural patterns&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Subscription vending is delivered as an &lt;STRONG&gt;AVM pattern module&lt;/STRONG&gt;, making it the preferred and supported approach for enterprise landing zones.&lt;/P&gt;
&lt;P&gt;Key benefits:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Microsoft supported&lt;/LI&gt;
&lt;LI&gt;Terraform and Bicep support&lt;/LI&gt;
&lt;LI&gt;Built‑in governance&lt;/LI&gt;
&lt;LI&gt;Used by Azure Landing Zone accelerators&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;AVM Subscription Vending Pattern Overview&lt;/H2&gt;
&lt;P&gt;The Terraform module used for subscription vending is:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Azure/avm-ptn-alz-sub-vending&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;This module enables:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure subscription creation&lt;/LI&gt;
&lt;LI&gt;Management group association&lt;/LI&gt;
&lt;LI&gt;Resource provider registration&lt;/LI&gt;
&lt;LI&gt;RBAC assignments&lt;/LI&gt;
&lt;LI&gt;Budget enforcement&lt;/LI&gt;
&lt;LI&gt;Optional networking scaffolding&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The module uses the &lt;STRONG&gt;AzAPI provider&lt;/STRONG&gt;, allowing subscription creation and governance configuration in a &lt;STRONG&gt;single Terraform apply&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Reference Architecture&lt;/H2&gt;
&lt;P&gt;High‑level flow:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Subscription request captured (YAML/JSON or pipeline input)&lt;/LI&gt;
&lt;LI&gt;CI/CD pipeline triggers Terraform&lt;/LI&gt;
&lt;LI&gt;AVM module creates the subscription&lt;/LI&gt;
&lt;LI&gt;Subscription is placed in the correct management group&lt;/LI&gt;
&lt;LI&gt;Governance (RBAC, policies, budgets) is applied automatically&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;This model aligns with Microsoft’s Landing Zone architecture.&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;Prerequisites&lt;/H2&gt;
&lt;H3&gt;1. Azure Billing and Tenant&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Enterprise Agreement (EA) or Microsoft Customer Agreement (MCA)&lt;/LI&gt;
&lt;LI&gt;Billing Scope ID, for example:&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="json"&gt;/providers/Microsoft.Billing/billingAccounts/&amp;lt;id&amp;gt;/enrollmentAccounts/&amp;lt;id&amp;gt;&lt;/LI-CODE&gt;
&lt;H3&gt;2. Management Group Hierarchy&lt;/H3&gt;
&lt;P&gt;A predefined management group hierarchy must exist:&lt;/P&gt;
&lt;LI-CODE lang="shell"&gt;Tenant Root
 ├── Platform
 ├── LandingZones
 │    ├── Corp
 │    └── Online
 └── Sandbox&lt;/LI-CODE&gt;
&lt;H3&gt;3. Tooling&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;az version
terraform version&lt;/LI-CODE&gt;
&lt;H2&gt;Identity &amp;amp; Permissions Model (Critical Section)&lt;/H2&gt;
&lt;P&gt;Automated subscription vending &lt;STRONG&gt;requires a non‑interactive identity&lt;/STRONG&gt; (Service Principal or Managed Identity).&lt;BR /&gt;The key permission is the &lt;STRONG&gt;Subscription Creator&lt;/STRONG&gt; role, which is a &lt;STRONG&gt;billing‑scope role&lt;/STRONG&gt;, not a standard Azure RBAC role.&lt;/P&gt;
&lt;P&gt;⚠️ Important&lt;BR /&gt;The Subscription Creator role &lt;STRONG&gt;cannot be assigned using the Azure Portal&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Assigning the Subscription Creator Role (Enterprise Agreement)&lt;/H2&gt;
&lt;H3&gt;Role Scope Summary&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Agreement&lt;/th&gt;&lt;th&gt;Role Scope&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;EA&lt;/td&gt;&lt;td&gt;Enrollment Account&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;MCA&lt;/td&gt;&lt;td&gt;Billing Profile / Invoice Section&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;This guide covers &lt;STRONG&gt;Enterprise Agreement (EA)&lt;/STRONG&gt;, which is most common in large landing zones.&lt;/P&gt;
&lt;H3&gt;Step 1: Create a Service Principal&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;az ad sp create-for-rbac \
--name avm-subscription-vending-sp \
--skip-assignment&lt;/LI-CODE&gt;
&lt;P&gt;Store securely:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;appId&lt;/LI&gt;
&lt;LI&gt;tenant&lt;/LI&gt;
&lt;LI&gt;password (or certificate)&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Step 2: Get Service Principal Object ID&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;az ad sp show \
--id &amp;lt;appId&amp;gt; \
--query id \
--output tsv&lt;/LI-CODE&gt;
&lt;P&gt;⚠️ Use the &lt;STRONG&gt;Object ID&lt;/STRONG&gt;, not the Application ID.&lt;/P&gt;
&lt;H3&gt;Step 3: Identify the Enrollment Account&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;az rest \
--method get \
--url "https://management.azure.com/providers/Microsoft.Billing/enrollmentAccounts?api-version=2019-10-01-preview"
&lt;/LI-CODE&gt;
&lt;P&gt;Capture:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Enrollment Account ID&lt;/LI&gt;
&lt;LI&gt;Full resource ID&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Step 4: Assign Subscription Creator Role (REST API)&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;az rest \
--method put \
--url "https://management.azure.com/providers/Microsoft.Billing/enrollmentAccounts/&amp;lt;ENROLLMENT_ACCOUNT_ID&amp;gt;/providers/Microsoft.Authorization/roleAssignments/&amp;lt;GUID&amp;gt;?api-version=2019-10-01-preview" \
--body '{
"properties": {
"principalId": "&amp;lt;SERVICE_PRINCIPAL_OBJECT_ID&amp;gt;",
"roleDefinitionId": "/providers/Microsoft.Authorization/roleDefinitions/4f8fab4f-1852-4a58-9a5b-5f5f75a2f8a8"
}
}'&lt;/LI-CODE&gt;
&lt;P&gt;Notes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&amp;lt;GUID&amp;gt; → any new GUID&lt;/LI&gt;
&lt;LI&gt;Role definition ID corresponds to &lt;STRONG&gt;Subscription Creator&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Step 5: Verify Assignment&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;az rest \
--method get \
--url "https://management.azure.com/providers/Microsoft.Billing/enrollmentAccounts/&amp;lt;ENROLLMENT_ACCOUNT_ID&amp;gt;/providers/Microsoft.Authorization/roleAssignments?api-version=2019-10-01-preview"&lt;/LI-CODE&gt;
&lt;H3&gt;Important Constraints&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Service Principal &lt;STRONG&gt;must be in the same tenant&lt;/STRONG&gt; as the EA billing account&lt;/LI&gt;
&lt;LI&gt;Cross‑tenant assignment is &lt;STRONG&gt;not supported&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Subscription Creator will &lt;STRONG&gt;not appear&lt;/STRONG&gt; in Azure Portal IAM&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Terraform Implementation Using AVM&lt;/H2&gt;
&lt;H3&gt;Provider Configuration&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~&amp;gt; 3.0"
    }
    azapi = {
      source = "Azure/azapi"
    }
  }
}

provider "azurerm" {
  features {}
}
&lt;/LI-CODE&gt;
&lt;H3&gt;AVM Subscription Vending Module&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;
module "subscription_vending" {
  source  = "Azure/avm-ptn-alz-sub-vending/azure"
  version = "0.1.1"

  location = "southeastasia"

  subscription_alias_enabled = true
  subscription_display_name  = "corp-finance-prod"
  subscription_alias_name    = "corp-finance-prod"
  subscription_workload      = "Production"
  subscription_billing_scope = var.billing_scope

  subscription_management_group_association_enabled = true
  subscription_management_group_id = "Corp"
}
&lt;/LI-CODE&gt;
&lt;H3&gt;Optional Governance&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Budgets&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="shell"&gt;budget_enabled = true
budget_amount  = 5000
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;RBAC&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="shell"&gt;role_assignments = {
  ops = {
    principal_id         = var.ops_group_id
    role_definition_name = "Contributor"
  }
}&lt;/LI-CODE&gt;
&lt;H3&gt;Deploy&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;terraform init
terraform plan
terraform apply
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Validate&lt;/H3&gt;
&lt;LI-CODE lang="shell"&gt;az account list --all --query "[?name=='corp-finance-prod']"
az managementgroup subscription list --name Corp
&lt;/LI-CODE&gt;
&lt;H2&gt;Operational Best Practices&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Use GitOps‑based subscription requests&lt;/LI&gt;
&lt;LI&gt;Store parameters in version‑controlled YAML/JSON&lt;/LI&gt;
&lt;LI&gt;Enforce PR approvals for new subscriptions&lt;/LI&gt;
&lt;LI&gt;Treat subscriptions as immutable infrastructure&lt;/LI&gt;
&lt;LI&gt;Regularly update AVM module versions&lt;/LI&gt;
&lt;LI&gt;Avoid tenant‑wide Owner permissions&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Conclusion&lt;/H2&gt;
&lt;P&gt;Subscription vending using Azure Verified Modules enables &lt;STRONG&gt;secure, scalable, and repeatable&lt;/STRONG&gt; Azure subscription management.&lt;BR /&gt;By combining AVM, Terraform, and correctly scoped billing permissions, platform teams can fully automate subscription creation while enforcing governance from day one.&lt;/P&gt;
&lt;P&gt;For any organization adopting Azure Landing Zones, &lt;STRONG&gt;AVM‑based subscription vending should be considered a foundational capability&lt;/STRONG&gt;, not an optional enhancement.&lt;/P&gt;
&lt;H2&gt;References&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Verified Modules&lt;BR /&gt;&lt;A class="lia-external-url" href="https://azure.github.io/Azure-Verified-Modules/" target="_blank"&gt;https://azure.github.io/Azure-Verified-Modules/&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;AVM Subscription Vending Module&lt;BR /&gt;&lt;A class="lia-external-url" href="https://registry.terraform.io/modules/Azure/avm-ptn-alz-sub-vending/azure/latest" target="_blank"&gt;https://registry.terraform.io/modules/Azure/avm-ptn-alz-sub-vending/azure/latest&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Subscription Vending – Azure Architecture Center&lt;BR /&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/architecture/landing-zones/subscription-vending" target="_blank"&gt;https://learn.microsoft.com/azure/architecture/landing-zones/subscription-vending&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Assign EA roles to service principals&lt;BR /&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/cost-management-billing/manage/assign-roles-azure-service-principals" target="_blank"&gt;https://learn.microsoft.com/azure/cost-management-billing/manage/assign-roles-azure-service-principals&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 03 Apr 2026 06:45:24 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/enterprise-scale-azure-subscription-vending-using-azure-verified/ba-p/4507751</guid>
      <dc:creator>kumaramit1</dc:creator>
      <dc:date>2026-04-03T06:45:24Z</dc:date>
    </item>
    <item>
      <title>VS Code Custom Agents: AI-Powered Terraform Security Scanning in the IDE</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/vs-code-custom-agents-ai-powered-terraform-security-scanning-in/ba-p/4507903</link>
      <description>&lt;P data-line="4"&gt;GitHub Copilot is already a powerful coding assistant, but out of the box it knows nothing specific about your project's conventions, security requirements, or operational processes. Custom agents change that. They let you define specialized AI assistants that live inside your repository, carry deep domain expertise, and behave consistently for every developer on your team.&lt;/P&gt;
&lt;P data-line="6"&gt;This blog explains what VS Code custom agents are, what they can do, and how to build one from scratch. While the concepts apply broadly to any development workflow, this post focuses specifically on Azure infrastructure teams using Terraform and demonstrates the approach through a practical example: An AI-powered security scanner for Terraform IaC modules.&lt;/P&gt;
&lt;H4 data-line="9"&gt;&lt;STRONG&gt;What are VS Code custom agents?&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="11"&gt;Starting with VS Code 1.99+, GitHub Copilot supports&amp;nbsp;&lt;STRONG&gt;custom agents&lt;/STRONG&gt; markdown files stored in your repository under .github/agents/. Each file defines a specialized AI assistant with its own:&lt;/P&gt;
&lt;UL data-line="13"&gt;
&lt;LI data-line="13"&gt;&lt;STRONG&gt;Name and description:&lt;/STRONG&gt;&amp;nbsp;who this agent is and when to invoke it&lt;/LI&gt;
&lt;LI data-line="14"&gt;&lt;STRONG&gt;Model selection: &lt;/STRONG&gt;which AI model powers it&lt;/LI&gt;
&lt;LI data-line="15"&gt;&lt;STRONG&gt;Tool permissions: &lt;/STRONG&gt;what actions it can take (read files, search, run commands)&lt;/LI&gt;
&lt;LI data-line="16"&gt;&lt;STRONG&gt;Instructions: &lt;/STRONG&gt;a system prompt that defines its expertise, behavior, and constraints&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="18"&gt;When you open a workspace containing these files, the agents appear as selectable options in the Copilot Chat panel. You can invoke them by selecting from the agent picker or typing @AgentName in chat.&lt;/P&gt;
&lt;P data-line="20"&gt;Think of custom agents as&amp;nbsp;&lt;STRONG&gt;specialized team members&lt;/STRONG&gt; you define once and every developer gets automatically when they clone the repository - a security reviewer, a code quality enforcer, a documentation generator, a deployment helper each with deep knowledge of their specific domain.&lt;/P&gt;
&lt;H4 data-line="23"&gt;&lt;STRONG&gt;How custom agents differ from regular Copilot chat?&lt;/STRONG&gt;&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Aspect&lt;/th&gt;&lt;th&gt;Regular Copilot Chat&lt;/th&gt;&lt;th&gt;Custom Agent&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Knowledge scope&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;General programming knowledge&lt;/td&gt;&lt;td&gt;Domain-specific expertise you define&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Consistency&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Varies by prompt phrasing&lt;/td&gt;&lt;td&gt;Consistent behavior across all users&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Tool access&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Context-dependent&lt;/td&gt;&lt;td&gt;Explicitly defined per agent&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Invocation&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Open chat&lt;/td&gt;&lt;td&gt;Named agent with focused scope&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Portability&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Per-user&lt;/td&gt;&lt;td&gt;Shared via repository&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Constraints&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;None by default&lt;/td&gt;&lt;td&gt;You define guardrails (e.g., no file edits)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="34"&gt;A regular Copilot chat session might give different answers about security best practices depending on how you phrase the question. A custom security agent gives consistent, structured findings every time because its behavior is defined in code you control.&lt;/P&gt;
&lt;H4 data-line="37"&gt;&lt;STRONG&gt;Anatomy of a custom agent file:&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="39"&gt;A custom agent is a single markdown file with two parts:&lt;/P&gt;
&lt;H6 data-line="41"&gt;&lt;STRONG&gt;Part 1: YAML frontmatter (metadata):&lt;/STRONG&gt;&lt;/H6&gt;
&lt;UL&gt;
&lt;LI data-line="45"&gt;&lt;STRONG&gt;name&lt;/STRONG&gt;: MyAgent description: "What this agent does and when to invoke it use keywords that match how users would naturally ask for help"&lt;/LI&gt;
&lt;LI data-line="45"&gt;&lt;STRONG&gt;model&lt;/STRONG&gt;: Claude Sonnet 4.5 (copilot)&lt;/LI&gt;
&lt;LI data-line="45"&gt;&lt;STRONG&gt;tools&lt;/STRONG&gt;: [read, search, execute]&lt;/LI&gt;
&lt;LI data-line="45"&gt;&lt;STRONG&gt;argument-hint&lt;/STRONG&gt;: "Hint text shown in the chat input when this agent is selected"&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6 data-line="54"&gt;&lt;STRONG&gt;Part 2: Markdown body (instructions):&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P data-line="56"&gt;Everything after the frontmatter is the&amp;nbsp;&lt;STRONG&gt;system prompt -&lt;/STRONG&gt;&amp;nbsp;the instructions that shape every response. This is where you define:&lt;/P&gt;
&lt;UL data-line="58"&gt;
&lt;LI data-line="58"&gt;The agent's role and expertise&lt;/LI&gt;
&lt;LI data-line="59"&gt;What it should and should not do&lt;/LI&gt;
&lt;LI data-line="60"&gt;How it should structure its output&lt;/LI&gt;
&lt;LI data-line="61"&gt;Domain-specific knowledge it should apply&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="63"&gt;The instructions can be as detailed as needed. Unlike a one-off prompt, these instructions are permanent and version-controlled alongside your code.&lt;/P&gt;
&lt;H6 data-line="66"&gt;&lt;STRONG&gt;Frontmatter fields explained:&lt;BR /&gt;&lt;/STRONG&gt;&lt;/H6&gt;
&lt;UL&gt;
&lt;LI data-line="66"&gt;&lt;STRONG style="color: rgb(30, 30, 30);"&gt;Name:&lt;/STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);"&gt;&lt;BR /&gt;&lt;/SPAN&gt;The agent's identifier. Appears in the agent picker dropdown and in @mentions. Use a clear, descriptive name without spaces.&lt;/LI&gt;
&lt;LI data-line="66"&gt;&lt;STRONG style="color: rgb(30, 30, 30);"&gt;Description:&lt;BR /&gt;&lt;/STRONG&gt;This is more than a label; Copilot uses the description to determine when to suggest this agent. Include keywords that match natural language users would type: "security", "scan", "review", "deploy", "validate". The more specific, the better.&lt;/LI&gt;
&lt;LI data-line="66"&gt;&lt;STRONG style="color: rgb(30, 30, 30);"&gt;Model:&lt;BR /&gt;&lt;/STRONG&gt;Which AI model powers the agent. Different models have different strengths:&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN lia-indent-padding-left-60px"&gt;&lt;table border="1" style="width: 64.8148%; height: 139.2px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr style="height: 34.8px;"&gt;&lt;th style="height: 34.8px;"&gt;Model&lt;/th&gt;&lt;th style="height: 34.8px;"&gt;Best For&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style="height: 34.8px;"&gt;&lt;td style="height: 34.8px;"&gt;Claude Sonnet 4.5&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;Code analysis, security review, structured output&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 34.8px;"&gt;&lt;td style="height: 34.8px;"&gt;GPT-4o&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;General reasoning, broad knowledge&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 34.8px;"&gt;&lt;td style="height: 34.8px;"&gt;o3-mini&lt;/td&gt;&lt;td style="height: 34.8px;"&gt;Fast responses, simple tasks&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P class="lia-indent-padding-left-60px" data-line="83"&gt;You choose the model that best fits the agent's job.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Tools:&lt;BR /&gt;&lt;/STRONG&gt;What the agent can do. Tool selection is a&amp;nbsp;&lt;STRONG style="color: rgb(30, 30, 30);"&gt;security and capability decision&lt;/STRONG&gt;&lt;SPAN style="color: rgb(30, 30, 30);"&gt;:&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN lia-indent-padding-left-60px"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Tool&lt;/th&gt;&lt;th&gt;Capability&lt;/th&gt;&lt;th&gt;Use When&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;ead&lt;/td&gt;&lt;td&gt;Read files in the workspace&lt;/td&gt;&lt;td&gt;Agent needs to analyze code&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;search&lt;/td&gt;&lt;td&gt;Search across workspace files&lt;/td&gt;&lt;td&gt;Agent needs to find files by name or content&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;execute&lt;/td&gt;&lt;td&gt;Run terminal commands&lt;/td&gt;&lt;td&gt;Agent needs to run scripts or tools&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;editFiles&lt;/td&gt;&lt;td&gt;Create or modify files&lt;/td&gt;&lt;td&gt;Agent should write or change code&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P class="lia-indent-padding-left-60px" data-line="96"&gt;Grant only what the agent needs. A read-only reviewer agent should never have editFiles. An agent that only answers questions needs only ead.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Argument-hint:&lt;BR /&gt;&lt;/STRONG&gt;The placeholder text in the chat input when this agent is selected. Helps users understand what to type: "Specify a folder to scan or 'all' for entire workspace".&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-line="103"&gt;&lt;STRONG&gt;What can custom agents do?&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="105"&gt;Custom agents work well for any repetitive expert judgment task, some of the common examples include:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Use Case&lt;/th&gt;&lt;th&gt;What the Agent Does&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Code Review&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Reviews code for quality issues, anti-patterns, and naming violations with line-level findings&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Security Scanning&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Checks infrastructure or application code against security baselines (CIS, NIST) with remediation guidance&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Documentation&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Reads source code and generates API references, runbooks, or architecture summaries in your team's format&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Onboarding&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Answers questions about codebase conventions and patterns grounded in the actual repository&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Deployment / Ops&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Guides engineers through deployment or incident response using your actual infrastructure config&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Testing&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Reviews test coverage and suggests missing cases based on code changes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Release Management&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Prepares release notes and version decisions from changelogs and git history&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4 data-line="120"&gt;&lt;STRONG&gt;Prerequisites to get started:&lt;/STRONG&gt;&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Requirement&lt;/th&gt;&lt;th&gt;Details&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;VS Code&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Version 1.99 or later&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;GitHub Copilot&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Active subscription (Individual, Business, or Enterprise)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Copilot Chat extension&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Installed and signed in to GitHub&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Agent mode enabled&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;VS Code Settings &amp;gt; search "chat agent"&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;A repository&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Agents live in .github/agents/ that is any local folder works&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="130"&gt;No additional extensions, frameworks, or infrastructure required. Agents are just markdown files.&lt;/P&gt;
&lt;H4 data-line="147"&gt;&lt;STRONG&gt;Building the IaC security scanner: A step-by-step guide&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="149"&gt;In General, the teams writing Terraform modules for Azure infrastructure need to ensure:&lt;/P&gt;
&lt;UL data-line="150"&gt;
&lt;LI data-line="150"&gt;RBAC roles follow least privilege (no Owner/Contributor assigned broadly)&lt;/LI&gt;
&lt;LI data-line="151"&gt;Network rules do not allow unrestricted inbound traffic&lt;/LI&gt;
&lt;LI data-line="152"&gt;Encryption is enforced with TLS 1.2 minimum&lt;/LI&gt;
&lt;LI data-line="153"&gt;Diagnostic logging is configured for audit trails&lt;/LI&gt;
&lt;LI data-line="154"&gt;Resource locks protect production resources from accidental deletion&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="156"&gt;These checks are typically done in CI/CD pipelines but that creates a slow feedback loop. A custom Copilot agent brings these checks into the IDE, giving developers security feedback while they write code.&lt;/P&gt;
&lt;H6 data-line="158"&gt;&lt;STRONG&gt;Step 1: Create the directory:&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P data-line="160"&gt;Create&amp;nbsp;.github/agents/&amp;nbsp;in your repository root if it does not already exist.&lt;/P&gt;
&lt;H6 data-line="162"&gt;&lt;STRONG&gt;Step 2: Create the agent file:&lt;/STRONG&gt;&lt;/H6&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="166"&gt;&lt;STRONG&gt;name&lt;/STRONG&gt;: IaCSecurityAgent description: "Scan Terraform and IaC files for security misconfigurations, insecure defaults, and compliance violations. Detects public endpoints, weak IAM, missing encryption, network exposure, and logging gaps. Use when user asks to check security, find misconfigurations, security review, or harden infrastructure"&lt;/P&gt;
&lt;P data-line="166"&gt;&lt;STRONG&gt;model&lt;/STRONG&gt;: Claude Sonnet 4.5 (copilot)&lt;/P&gt;
&lt;P data-line="166"&gt;&lt;STRONG&gt;tools&lt;/STRONG&gt;: [read, search, execute]&amp;nbsp;&lt;/P&gt;
&lt;P data-line="166"&gt;&lt;STRONG&gt;argument-hint&lt;/STRONG&gt;: "Specify directory to scan (e.g., 'resource-groups'), multiple directories (e.g., 'resource-groups, nsg'), or 'all' for entire workspace"&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;UL&gt;
&lt;LI data-line="179"&gt;&lt;STRONG&gt;Why Claude Sonnet 4.5?&lt;/STRONG&gt;&amp;nbsp;&lt;BR /&gt;This model was chosen for its strong code analysis, ability to reason about security context (not just pattern-match), and consistent structured output.&lt;/LI&gt;
&lt;LI data-line="181"&gt;&lt;STRONG&gt;Why execute?&lt;/STRONG&gt; &lt;BR /&gt;The agent saves reports by calling a helper PowerShell script. This eliminates a separate user-triggered step.&lt;/LI&gt;
&lt;LI data-line="183"&gt;&lt;STRONG&gt;Why not editFiles?&lt;/STRONG&gt;&amp;nbsp;&lt;BR /&gt;When the agent reports findings, it does not fix them unless the user explicitly asks. This keeps the agent in an advisory role and prevents unintended changes.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6 data-line="185"&gt;&lt;STRONG&gt;Step 3: Open VS Code and test:&lt;/STRONG&gt;&lt;/H6&gt;
&lt;OL data-line="187"&gt;
&lt;LI data-line="187"&gt;Open the Copilot Chat panel (Ctrl+Alt+I)&lt;/LI&gt;
&lt;LI data-line="188"&gt;Click the agent picker (the @ icon or agent name area)&lt;/LI&gt;
&lt;LI data-line="189"&gt;Your new agent should appear in the list&lt;/LI&gt;
&lt;LI data-line="190"&gt;Select it and type:&amp;nbsp;scan resource-groups&lt;/LI&gt;
&lt;/OL&gt;
&lt;H6 data-line="192"&gt;&lt;STRONG&gt;Step 4: Iterate on the instructions:&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P data-line="194"&gt;The instructions are just text and so anyone can easily edit them, commit the changes, and the agent behavior updates immediately for everyone on the team. Treat agent instructions like code: review, create versions and improve them over time.&lt;/P&gt;
&lt;H4 data-line="196"&gt;&lt;STRONG&gt;What the agent checks:&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="198"&gt;The above agent's instructions in the &lt;STRONG&gt;name &lt;/STRONG&gt;field define six security domains it checks against every Terraform resource:&lt;/P&gt;
&lt;OL&gt;
&lt;LI data-line="200"&gt;&lt;STRONG&gt;&amp;nbsp;Identity and Access Management (IAM):&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL data-line="201"&gt;
&lt;LI&gt;Overly permissive RBAC roles (Owner, Contributor at broad scope)&lt;/LI&gt;
&lt;LI data-line="202"&gt;Missing managed identity configuration (using keys instead)&lt;/LI&gt;
&lt;LI data-line="203"&gt;Hardcoded credentials or secrets&lt;/LI&gt;
&lt;LI data-line="204"&gt;Missing validation on role assignment variables&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="2"&gt;
&lt;LI data-line="206"&gt;&lt;STRONG&gt;&amp;nbsp;Network Security:&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL data-line="207"&gt;
&lt;LI&gt;Public endpoints on databases, storage, Key Vaults&lt;/LI&gt;
&lt;LI data-line="208"&gt;Admin ports (22, 3389) open to 0.0.0.0/0&lt;/LI&gt;
&lt;LI data-line="209"&gt;Missing private endpoints for PaaS services&lt;/LI&gt;
&lt;LI data-line="210"&gt;NSG rules allowing wildcard source addresses&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="3"&gt;
&lt;LI data-line="212"&gt;&lt;STRONG&gt;&amp;nbsp;Data Protection and Encryption:&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL data-line="213"&gt;
&lt;LI&gt;Encryption at rest disabled&lt;/LI&gt;
&lt;LI data-line="214"&gt;TLS version below 1.2&lt;/LI&gt;
&lt;LI data-line="215"&gt;HTTPS not enforced&lt;/LI&gt;
&lt;LI data-line="216"&gt;Secrets stored in plain text in variables&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="4"&gt;
&lt;LI data-line="218"&gt;&lt;STRONG&gt;&amp;nbsp;Logging and Monitoring:&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL data-line="219"&gt;
&lt;LI&gt;Missing azurerm_monitor_diagnostic_setting resources&lt;/LI&gt;
&lt;LI data-line="220"&gt;Log retention below 90 days&lt;/LI&gt;
&lt;LI data-line="221"&gt;No audit logging on Key Vault, SQL, or AKS&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="5"&gt;
&lt;LI data-line="223"&gt;&lt;STRONG&gt;&amp;nbsp;Container and Workload Security:&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL data-line="224"&gt;
&lt;LI&gt;AKS without RBAC enabled&lt;/LI&gt;
&lt;LI data-line="225"&gt;Local accounts not disabled&lt;/LI&gt;
&lt;LI data-line="226"&gt;Missing network policy configuration&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="6"&gt;
&lt;LI data-line="228"&gt;&lt;STRONG&gt;&amp;nbsp;Backup and Disaster Recovery:&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL data-line="229"&gt;
&lt;LI&gt;Key Vault without purge protection&lt;/LI&gt;
&lt;LI data-line="230"&gt;Missing soft delete configuration&lt;/LI&gt;
&lt;LI data-line="231"&gt;No geo-redundancy for critical data&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6 data-line="252"&gt;&lt;STRONG&gt;Compliance framework alignment:&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P data-line="252"&gt;Findings are mapped to Azure-relevant controls:&lt;/P&gt;
&lt;UL data-line="253"&gt;
&lt;LI data-line="253"&gt;&lt;STRONG&gt;CIS Azure Foundations Benchmark&lt;/STRONG&gt;&amp;nbsp;(e.g., CIS 3.7 for storage public access, CIS 6.1 for NSG rules)&lt;/LI&gt;
&lt;LI data-line="254"&gt;&lt;STRONG&gt;Azure Security Benchmark v3&lt;/STRONG&gt;&amp;nbsp;(e.g., NS-1 for network segmentation, PA-7 for privileged access, DP-4 for encryption)&lt;/LI&gt;
&lt;LI data-line="255"&gt;&lt;STRONG&gt;NIST 800-53&lt;/STRONG&gt; (e.g., SC-7 for boundary protection, AC-6 for least privilege)&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-line="257"&gt;&lt;STRONG&gt;Choosing the right scanning scope:&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="235"&gt;The agent supports flexible scope: single folder, multiple folders, or entire workspace auto-discovery. When a user says "scan all", the agent searches for every .tf file, groups them by directory, and scans each independently.&lt;/P&gt;
&lt;H4 data-line="261"&gt;&lt;STRONG&gt;The structured security scan output:&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="239"&gt;Every finding follows a consistent format. Here is an example of a security scan result:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;H6 data-line="42"&gt;[MEDIUM] IAM-002: Missing principal_type default recommendation&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;File:&lt;/STRONG&gt;&amp;nbsp;user-assigned-identity/variables.tf (Line 45)&lt;/LI&gt;
&lt;LI data-line="44"&gt;&lt;STRONG&gt;Resource:&lt;/STRONG&gt;&amp;nbsp;var.rg_role_assignments.principal_type&lt;/LI&gt;
&lt;LI data-line="45"&gt;&lt;STRONG&gt;Issue:&lt;/STRONG&gt;&amp;nbsp;principal_type is optional with null default. In environments with ABAC policies, role assignments may fail if this is not explicitly set.&lt;/LI&gt;
&lt;LI data-line="46"&gt;&lt;STRONG&gt;Impact:&lt;/STRONG&gt;&amp;nbsp;Role assignments could fail silently or be mis-scoped in ABAC-constrained environments.&lt;/LI&gt;
&lt;LI data-line="47"&gt;&lt;STRONG&gt;Compliance:&lt;/STRONG&gt; Azure Security Benchmark PA-7&lt;/LI&gt;
&lt;/UL&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H4 data-line="261"&gt;&lt;STRONG&gt;Results from real scans:&lt;/STRONG&gt;&lt;/H4&gt;
&lt;H6&gt;&lt;STRONG&gt;Security Scan:&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P data-line="263"&gt;Scanning three Azure Terraform modules with custom agent produced the following results:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Module&lt;/th&gt;&lt;th&gt;CRITICAL&lt;/th&gt;&lt;th&gt;HIGH&lt;/th&gt;&lt;th&gt;MEDIUM&lt;/th&gt;&lt;th&gt;LOW&lt;/th&gt;&lt;th&gt;Key Finding&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;resource-groups&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;Role assignments allow Owner/Contributor&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;nsg&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;Wildcard source addresses and ports not blocked&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;user-assigned-identity&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;Managed identity lacks role_assignments field — permissions must be set manually post-creation&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H6 data-line="271"&gt;&lt;STRONG&gt;Generated Security scan report:&lt;/STRONG&gt;&lt;/H6&gt;
&lt;img /&gt;&lt;img /&gt;
&lt;P data-line="271"&gt;All findings included exact file paths, line numbers, and Terraform code fixes.&lt;/P&gt;
&lt;H4 data-line="273"&gt;&lt;STRONG&gt;The companion quality scanner:&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="275"&gt;Alongside the security agent, the workspace includes a second agent: a &lt;STRONG&gt;Super-Linter Scanner&lt;/STRONG&gt; that runs native static analysis tools:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Tool&lt;/th&gt;&lt;th&gt;Version&lt;/th&gt;&lt;th&gt;Purpose&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;TFLint&lt;/td&gt;&lt;td&gt;v0.53.0&lt;/td&gt;&lt;td&gt;Naming conventions, unused declarations, provider pinning&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;terraform fmt&lt;/td&gt;&lt;td&gt;v1.9.8&lt;/td&gt;&lt;td&gt;Code formatting validation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;yamllint&lt;/td&gt;&lt;td&gt;latest&lt;/td&gt;&lt;td&gt;YAML syntax and style&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;PSScriptAnalyzer&lt;/td&gt;&lt;td&gt;latest&lt;/td&gt;&lt;td&gt;PowerShell best practices&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="284"&gt;This agent calls a PowerShell script that produces SARIF output (viewable inline in VS Code via the SARIF Viewer extension) and an HTML report. Tool versions are pinned to match the CI/CD pipeline's super-linter commit, so local results are consistent with what CI would produce.&lt;/P&gt;
&lt;H4 data-line="297"&gt;&lt;STRONG&gt;Why agent-based scanning goes beyond traditional tools?&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="299"&gt;Traditional static analysis tools like tfsec, Checkov, or tflint work by matching code patterns against a database of rules. They catch what they know about. The AI agent adds a layer of&amp;nbsp;&lt;STRONG&gt;reasoning&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL data-line="301"&gt;
&lt;LI data-line="301"&gt;It can recognize that a variable accepting any role name is dangerous&amp;nbsp;&lt;STRONG&gt;even when no bad value is currently assigned&lt;/STRONG&gt; the vulnerability is the missing validation, not an existing misconfiguration.&lt;/LI&gt;
&lt;LI data-line="302"&gt;It can correlate findings across files (a storage account in one file, its network rules in another).&lt;/LI&gt;
&lt;LI data-line="303"&gt;It maps findings to compliance frameworks without you maintaining a rule-to-control mapping table.&lt;/LI&gt;
&lt;LI data-line="304"&gt;It produces natural language explanations of why something is a problem, not just a rule ID.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="306"&gt;This does not replace deterministic tools, but it complements them. Use both.&lt;/P&gt;
&lt;H4 data-line="309"&gt;&lt;STRONG&gt;Key takeaways:&lt;/STRONG&gt;&lt;/H4&gt;
&lt;UL data-line="311"&gt;
&lt;LI data-line="311"&gt;Custom VS Code Copilot agents are markdown files in .github/agents/ with no extension development, no deployment, no infrastructure required.&lt;/LI&gt;
&lt;LI data-line="312"&gt;The YAML frontmatter controls model selection, tool permissions, and how Copilot decides when to suggest the agent.&lt;/LI&gt;
&lt;LI data-line="313"&gt;The markdown body is your system prompt, treat it like code: version, review and iterate on it.&lt;/LI&gt;
&lt;LI data-line="314"&gt;Tool permissions are a security decision: grant only what the agent needs.&lt;/LI&gt;
&lt;LI data-line="315"&gt;Custom agents are portable that means anyone who clones the repository gets the agents automatically.&lt;/LI&gt;
&lt;LI data-line="316"&gt;Combining AI reasoning with deterministic tools (tflint, terraform fmt) provides coverage neither can achieve alone.&lt;/LI&gt;
&lt;LI data-line="317"&gt;The agent pattern applies far beyond security scanning such as documentation, onboarding, deployment, testing, compliance.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-line="320"&gt;&lt;STRONG&gt;Useful resources:&lt;/STRONG&gt;&lt;/H4&gt;
&lt;UL data-line="322"&gt;
&lt;LI data-line="322"&gt;&lt;A class="lia-external-url" href="https://code.visualstudio.com/docs/copilot/chat/chat-agent-mode" target="_blank" rel="noopener" data-href="https://code.visualstudio.com/docs/copilot/chat/chat-agent-mode"&gt;VS Code Custom Agents documentation&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="323"&gt;&lt;A class="lia-external-url" href="https://docs.github.com/en/copilot" target="_blank" rel="noopener" data-href="https://docs.github.com/en/copilot"&gt;GitHub Copilot documentation&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="324"&gt;&lt;A class="lia-external-url" href="https://www.cisecurity.org/benchmark/azure" target="_blank" rel="noopener" data-href="https://www.cisecurity.org/benchmark/azure"&gt;CIS Azure Foundations Benchmark&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="325"&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/security/benchmarks/" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/azure/security/benchmarks/"&gt;Azure Security Benchmark&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="326"&gt;&lt;A class="lia-external-url" href="https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final" target="_blank" rel="noopener" data-href="https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final"&gt;NIST 800-53 Rev 5&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="327"&gt;&lt;A class="lia-external-url" href="https://github.com/terraform-linters/tflint" target="_blank" rel="noopener" data-href="https://github.com/terraform-linters/tflint"&gt;TFLint documentation&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="328"&gt;&lt;A class="lia-external-url" href="https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer" target="_blank" rel="noopener" data-href="https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer"&gt;SARIF Viewer for VS Code&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Thu, 02 Apr 2026 09:30:08 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/vs-code-custom-agents-ai-powered-terraform-security-scanning-in/ba-p/4507903</guid>
      <dc:creator>SundarBalajiA</dc:creator>
      <dc:date>2026-04-02T09:30:08Z</dc:date>
    </item>
    <item>
      <title>Migrating Azure SQL Database Across Tenants Using Subscription Transfer</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/migrating-azure-sql-database-across-tenants-using-subscription/ba-p/4507002</link>
      <description>&lt;P&gt;Moving an Azure subscription between Microsoft Entra ID (formerly Azure Active Directory) tenants is a common requirement during:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Mergers and acquisitions&lt;/LI&gt;
&lt;LI&gt;Organizational restructuring&lt;/LI&gt;
&lt;LI&gt;Enterprise Agreement (EA) enrollment realignment&lt;/LI&gt;
&lt;LI&gt;Lighthouse or multi‑tenant operating model changes&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;However, when that subscription hosts &lt;STRONG&gt;Azure SQL databases secured using Microsoft Entra ID authentication&lt;/STRONG&gt;, the migration becomes identity‑dependent and must be executed carefully.&lt;/P&gt;
&lt;P&gt;Microsoft Entra ID identities are &lt;STRONG&gt;tenant‑scoped&lt;/STRONG&gt;. During a cross‑tenant subscription transfer, identities from the source tenant are no longer trusted by the target tenant. This directly impacts:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure RBAC role assignments&lt;/LI&gt;
&lt;LI&gt;Azure SQL Entra ID logins&lt;/LI&gt;
&lt;LI&gt;Managed identities&lt;/LI&gt;
&lt;LI&gt;Application authentication tokens&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If authentication sequencing is not handled correctly, administrators and applications can lose access to Azure SQL immediately after the migration.&lt;/P&gt;
&lt;P&gt;This article provides a proven &lt;STRONG&gt;enterprise‑tested 8‑step migration playbook&lt;/STRONG&gt; to safely migrate Azure SQL workloads across Microsoft Entra ID tenants without administrative lockout or application downtime.&lt;/P&gt;
&lt;H5&gt;&lt;STRONG&gt;Migration Overview&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;The migration follows a three‑stage operating model:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Before Subscription Transfer (Source Tenant)&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Export RBAC assignments and SQL identity configuration&lt;/LI&gt;
&lt;LI&gt;Perform and validate a SQL backup&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6&gt;&lt;STRONG&gt;During Migration&lt;/STRONG&gt;&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;Temporarily enable SQL Authentication&lt;/LI&gt;
&lt;LI&gt;Ensure administrative access independent of Entra ID&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6&gt;&lt;STRONG&gt;After Subscription Transfer (Target Tenant)&lt;/STRONG&gt;&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;Configure new Entra ID administrator&lt;/LI&gt;
&lt;LI&gt;Recreate RBAC assignments and database principals&lt;/LI&gt;
&lt;LI&gt;Validate application connectivity&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The diagram represents&amp;nbsp;&lt;STRONG&gt;three distinct phases&lt;/STRONG&gt;, which are explained below exactly as they appear in the flow.&lt;/P&gt;
&lt;img /&gt;
&lt;H5&gt;&lt;STRONG&gt;Phase 1 – Pre‑Migration Preparation&lt;/STRONG&gt;&lt;/H5&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 1 – Export SQL Server Logins, Database Users, Roles, and Permissions&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Before initiating the subscription transfer, export all identity‑related configuration from the source Azure SQL Server.&lt;/P&gt;
&lt;P&gt;This export serves as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A rollback reference&lt;/LI&gt;
&lt;LI&gt;A rebuild blueprint for the target tenant&lt;/LI&gt;
&lt;LI&gt;An audit artifact for change tracking&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Capture the following:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Server Level&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Microsoft Entra ID users&lt;/LI&gt;
&lt;LI&gt;Microsoft Entra ID groups&lt;/LI&gt;
&lt;LI&gt;SQL Authentication logins&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Database Level&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Contained database users&lt;/LI&gt;
&lt;LI&gt;Database role memberships
&lt;UL&gt;
&lt;LI&gt;db_owner&lt;/LI&gt;
&lt;LI&gt;db_datareader&lt;/LI&gt;
&lt;LI&gt;custom roles&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Azure RBAC Assignments On&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;SQL Server&lt;/LI&gt;
&lt;LI&gt;Resource Group&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Database Permissions&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;GRANT&lt;/LI&gt;
&lt;LI&gt;DENY&lt;/LI&gt;
&lt;LI&gt;EXECUTE&lt;/LI&gt;
&lt;LI&gt;Object‑level access&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;✅ Subscription transfers always remove Microsoft Entra ID–based role assignments. Treat this export as mandatory.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 2 –&lt;/STRONG&gt; &lt;STRONG&gt;Perform SQL Database Backup&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Take a fresh backup immediately before migration.&lt;/P&gt;
&lt;P&gt;Recommended enterprise options include:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;BACPAC export to Azure Blob Storage&lt;/LI&gt;
&lt;LI&gt;Native BACKUP TO URL&lt;/LI&gt;
&lt;LI&gt;Long‑Term Retention (LTR) snapshot&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;⚠️ A backup that has not been test‑restored is not a valid rollback strategy. Always validate restore operations in a non‑production environment before migration day.&lt;/P&gt;
&lt;H5&gt;&lt;STRONG&gt;Phase 2 – Migration Day Execution&lt;/STRONG&gt;&lt;/H5&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 3 –&lt;/STRONG&gt; &lt;STRONG&gt;Switch Authentication Mode (Entra ID ➜ SQL Authentication)&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;This is the most critical step in the entire migration workflow.&lt;/P&gt;
&lt;P&gt;Before initiating the EA subscription transfer:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Disable Entra ID–only authentication&lt;/LI&gt;
&lt;LI&gt;Enable SQL Authentication&lt;/LI&gt;
&lt;LI&gt;Validate SQL Administrator login&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6&gt;&lt;STRONG&gt;Why This Is Required&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Microsoft Entra ID identities exist only within their original tenant.&lt;BR /&gt;Once the subscription is transferred:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Source tenant identities become invalid&lt;/LI&gt;
&lt;LI&gt;Azure SQL logins mapped to those identities stop functioning&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;SQL Authentication provides a:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Tenant‑independent authentication method&lt;/LI&gt;
&lt;LI&gt;Temporary administrative access path&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Skipping this step may result in &lt;STRONG&gt;complete administrative lockout&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 4 –&lt;/STRONG&gt; &lt;STRONG&gt;Migrate Subscription via EA Transfer&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;The EA transfer moves the subscription to the target tenant directory.&lt;/P&gt;
&lt;H6&gt;Preserved During Migration&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;Azure resources&lt;/LI&gt;
&lt;LI&gt;SQL databases&lt;/LI&gt;
&lt;LI&gt;Database data&lt;/LI&gt;
&lt;LI&gt;SQL Authentication logins&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6&gt;Removed During Migration&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;Azure RBAC assignments&lt;/LI&gt;
&lt;LI&gt;Entra ID SQL logins&lt;/LI&gt;
&lt;LI&gt;Managed Identity bindings&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;&lt;STRONG&gt;Phase 3 – Post‑Migration Configuration&lt;/STRONG&gt;&lt;/H5&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 5 –&lt;/STRONG&gt; &lt;STRONG&gt;Configure Microsoft Entra ID Administrator (Target Tenant)&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;After migration:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Assign a new Microsoft Entra ID Administrator to the Azure SQL Server&lt;/LI&gt;
&lt;LI&gt;Use an Entra ID &lt;STRONG&gt;Group&lt;/STRONG&gt; instead of an individual user&lt;/LI&gt;
&lt;LI&gt;Optionally re‑enable Entra ID‑only authentication after validation&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Using groups ensures identity continuity if administrators change or accounts are disabled.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 6 – Re‑Create RBAC Assignments and Database Identity Configuration&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Using the exported configuration:&lt;/P&gt;
&lt;P&gt;Reapply:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure RBAC role assignments&lt;/LI&gt;
&lt;LI&gt;Microsoft Entra ID SQL logins&lt;/LI&gt;
&lt;LI&gt;Database users&lt;/LI&gt;
&lt;LI&gt;Role memberships&lt;/LI&gt;
&lt;LI&gt;Object‑level permissions&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;✅ Automation is strongly recommended for environments hosting multiple databases or elastic pools.&lt;/P&gt;
&lt;H5&gt;&lt;STRONG&gt;Phase 4 – Validation and Testing&lt;/STRONG&gt;&lt;/H5&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 7 –&lt;/STRONG&gt; &lt;STRONG&gt;Database Integration Testing&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Validate:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Entra ID administrator connectivity&lt;/LI&gt;
&lt;LI&gt;Application identity access&lt;/LI&gt;
&lt;LI&gt;Read and write operations&lt;/LI&gt;
&lt;LI&gt;Stored procedures&lt;/LI&gt;
&lt;LI&gt;Backup and restore functions&lt;/LI&gt;
&lt;LI&gt;Firewall rules&lt;/LI&gt;
&lt;LI&gt;Private endpoint connectivity&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 8 –&lt;/STRONG&gt; &lt;STRONG&gt;End‑to‑End Application Testing&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Perform full application validation including:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Authentication via Entra ID tokens&lt;/LI&gt;
&lt;LI&gt;Business transaction workflows&lt;/LI&gt;
&lt;LI&gt;Scheduled jobs and background services&lt;/LI&gt;
&lt;LI&gt;Performance baselines&lt;/LI&gt;
&lt;LI&gt;Error telemetry&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Migration should be considered complete &lt;STRONG&gt;only after application owner sign‑off&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H5&gt;Common Migration Pitfalls&lt;/H5&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Pitfall&lt;/th&gt;&lt;th&gt;Impact&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;SQL Authentication not enabled before transfer&lt;/td&gt;&lt;td&gt;Administrative lockout&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Using individual Entra ID user as SQL admin&lt;/td&gt;&lt;td&gt;Risk of orphaned access&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Assuming Azure RBAC survives migration&lt;/td&gt;&lt;td&gt;RBAC must be recreated&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Hard‑coded tenant IDs in applications&lt;/td&gt;&lt;td&gt;Authentication failures&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Untested backups&lt;/td&gt;&lt;td&gt;No reliable rollback&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H5&gt;&lt;STRONG&gt;Migration Checklist&lt;/STRONG&gt;&lt;/H5&gt;
&lt;H6&gt;&lt;STRONG&gt;Pre‑Migration&lt;/STRONG&gt;&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;Export SQL logins, users, roles, permissions&lt;/LI&gt;
&lt;LI&gt;Export Azure RBAC assignments&lt;/LI&gt;
&lt;LI&gt;Perform full database backup&lt;/LI&gt;
&lt;LI&gt;Test restore in lower environment&lt;/LI&gt;
&lt;LI&gt;Enable SQL Authentication&lt;/LI&gt;
&lt;LI&gt;Validate SQL Admin login&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6&gt;&lt;STRONG&gt;Migration Day&lt;/STRONG&gt;&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;Switch authentication mode&lt;/LI&gt;
&lt;LI&gt;Store SQL admin credentials securely&lt;/LI&gt;
&lt;LI&gt;Initiate EA transfer&lt;/LI&gt;
&lt;LI&gt;Accept transfer in target enrollment&lt;/LI&gt;
&lt;LI&gt;Confirm subscription visibility&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6&gt;&lt;STRONG&gt;Post‑Migration&lt;/STRONG&gt;&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;Configure Microsoft Entra ID Admin&lt;/LI&gt;
&lt;LI&gt;Reapply RBAC assignments&lt;/LI&gt;
&lt;LI&gt;Recreate SQL Entra ID logins&lt;/LI&gt;
&lt;LI&gt;Recreate database users&lt;/LI&gt;
&lt;LI&gt;Reassign roles and permissions&lt;/LI&gt;
&lt;/UL&gt;
&lt;H6&gt;&lt;STRONG&gt;Validation and Sign‑Off&lt;/STRONG&gt;&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;Database validation complete&lt;/LI&gt;
&lt;LI&gt;Application validation complete&lt;/LI&gt;
&lt;LI&gt;Performance baseline verified&lt;/LI&gt;
&lt;LI&gt;Security review approved&lt;/LI&gt;
&lt;LI&gt;CAB approval obtained&lt;/LI&gt;
&lt;LI&gt;Migration documentation completed&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;&lt;STRONG&gt;Conclusion&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;Cross‑tenant Azure SQL migrations succeed or fail based on authentication sequencing.&lt;/P&gt;
&lt;P&gt;Temporarily switching from Microsoft Entra ID authentication to SQL Authentication prior to subscription transfer—combined with disciplined RBAC export and re‑application—provides a:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Safe&lt;/LI&gt;
&lt;LI&gt;Repeatable&lt;/LI&gt;
&lt;LI&gt;Auditable&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;migration strategy.&lt;/P&gt;
&lt;P&gt;This approach scales from single‑database migrations to enterprise‑wide SQL estates and is particularly suited for regulated environments where downtime and access risk must be tightly controlled.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Apr 2026 16:01:18 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/migrating-azure-sql-database-across-tenants-using-subscription/ba-p/4507002</guid>
      <dc:creator>princy_rajpoot</dc:creator>
      <dc:date>2026-04-01T16:01:18Z</dc:date>
    </item>
    <item>
      <title>Subscription Vending in Azure: An Implementation Overview</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/subscription-vending-in-azure-an-implementation-overview/ba-p/4506350</link>
      <description>&lt;P&gt;Subscription vending is a process that enables the creation of multiple Azure subscriptions using code, based on organizational segregation or workload-specific requirements. Rather than relying on resource groups as the primary boundary, this approach treats &lt;STRONG&gt;subscriptions as the fundamental unit for workload management&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/design-area/media/subscription-vending-high-res.png#lightbox" target="_blank" rel="noopener"&gt;&lt;EM&gt;Diagram 1: Subscription Vending&lt;/EM&gt;&lt;/A&gt;&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Subscription vending follows the concept of&amp;nbsp;&lt;STRONG&gt;subscription democratization&lt;/STRONG&gt; and applies it within the Azure Landing Zone (ALZ) model. With this approach, subscriptions act as the foundational boundary for the organization. This makes it easier to scale environments while also enabling stronger regulation, governance, and security controls.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Subscription democratization&lt;/STRONG&gt; is a scalable approach that helps accelerate application migration or new application deployment. It enables teams to work independently and deliver results faster, while still maintaining proper governance and security. Through subscription vending, multiple subscriptions can be deployed based on individual workload requirements&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;Subscription Vending Implementation Guidance&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P&gt;Subscription vending is achieved through automation and typically involves the following tasks:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Collecting subscription request data&lt;/LI&gt;
&lt;LI&gt;Initiating platform automation&lt;/LI&gt;
&lt;LI&gt;Creating subscriptions using Infrastructure as Code (IaC)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;There are multiple ways to implement subscription vending automation to complete these tasks. One example approach is &lt;STRONG&gt;GitFlow&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;In this model, subscription request data is captured through a data collection tool and stored in a JSON or YAML parameter file. Once the request is approved, platform automation is triggered using a request pipeline, source control, and a deployment pipeline. IaC modules are then used to create the required subscription.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/design-area/media/subscription-vending-high-res.png#lightbox" target="_blank" rel="noopener"&gt;&lt;EM&gt;Diagram 2: Example of Subscription Vending GitFlow&lt;/EM&gt;&lt;/A&gt;&lt;/img&gt;
&lt;H4&gt;&lt;STRONG&gt;Implementation Steps&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;The following steps describe the implementation flow shown in the diagram:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A data collection tool is used to gather subscription request information.&lt;/LI&gt;
&lt;LI&gt;Once the subscription request is approved, platform automation is initiated through the request pipeline, source control, and deployment pipeline.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;To standardize and regulate the foundational structure across environments, automation is implemented using Infrastructure as Code. This approach also enables new subscriptions to be deployed with minimal effort.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Resources Deployed During Subscription Creation&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;As a best practice, the following resources are deployed during subscription creation:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Management Group:&lt;/STRONG&gt; Management groups are created based on the organizational design and structure.&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Subscription:&lt;/STRONG&gt; Subscriptions are created using code according to design requirements. During creation, billing account details are configured to align with the billing scope. A subscription alias is also added at this stage. Once the subscription is created, it is associated with the appropriate management group. Capabilities such as renaming or cancelling subscriptions can also be managed. Cancelling a subscription through Terraform can deactivate it; the subscription can be reenabled within 90 days. After 90 days, the subscription is permanently deleted.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Budget:&lt;/STRONG&gt; Subscription budgets can be defined based on required thresholds.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Resource Provider Registration: &lt;/STRONG&gt;Required resource providers are enabled by default, allowing the necessary REST operations for resource deployment.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Identity Management:&lt;/STRONG&gt; Required role assignments, including custom roles, can be applied at the subscription or scoped level. Custom RBAC roles can be created if prebuilt roles do not meet requirements and assigned at the subscription level.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;Additional Notes&lt;/H4&gt;
&lt;P&gt;A&amp;nbsp;&lt;STRONG&gt;subscription alias&lt;/STRONG&gt; in Azure is a resource type used to create a new subscription, typically under an Enterprise Agreement (EA) billing model. An alias enables the creation of new subscriptions but cannot be used to update existing ones.&lt;/P&gt;
&lt;P&gt;Azure provides &lt;A class="lia-external-url" href="https://azure.github.io/Azure-Verified-Modules/" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Azure Verified Modules (AVM)&lt;/STRONG&gt;&lt;/A&gt; for all the resources mentioned above. These modules help standardize implementation and follow best practices. The reference implementation is available through the AVM pattern for subscription vending.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Mar 2026 07:48:15 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/subscription-vending-in-azure-an-implementation-overview/ba-p/4506350</guid>
      <dc:creator>abhilashasr</dc:creator>
      <dc:date>2026-03-31T07:48:15Z</dc:date>
    </item>
    <item>
      <title>VS Code Extension</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/vs-code-extension/ba-p/4500803</link>
      <description>&lt;H2&gt;&lt;STRONG&gt;What is a VS Code Extension?&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P&gt;A VS Code Extension is a small program that adds new features or enhances existing functionality in Visual Studio Code. Extensions allow developers to tailor the editor to their needs by adding support for new languages, tools, themes, debuggers, commands, and integrations.&lt;/P&gt;
&lt;P&gt;VS Code extensions can help you:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Add language support (syntax highlighting, IntelliSense)&lt;/LI&gt;
&lt;LI&gt;Create custom commands in the Command Palette&lt;/LI&gt;
&lt;LI&gt;Automate repetitive development tasks&lt;/LI&gt;
&lt;LI&gt;Integrate external tools and services&lt;/LI&gt;
&lt;LI&gt;Improve productivity with formatting, linting, and debugging tools&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Extensions are published and distributed through the Visual Studio Marketplace, where users can easily install and update them directly from VS Code .&lt;/P&gt;
&lt;P&gt;At a high level, a VS Code extension:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Runs on Node.js&lt;/LI&gt;
&lt;LI&gt;Is written in JavaScript or TypeScript&lt;/LI&gt;
&lt;LI&gt;Uses the VS Code Extension API to interact with the editor .&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;How to Create a VS Code Extension (Step-by-Step)&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P&gt;Let’s walk through creating a simple Hello World VS Code extension using the official tooling provided by Microsoft.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Step 1: Prerequisites&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Before you begin, make sure you have the following installed:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Node.js (required to run and build extensions)&lt;/LI&gt;
&lt;LI&gt;npm (comes with Node.js)&lt;/LI&gt;
&lt;LI&gt;Git (recommended for source control)&lt;/LI&gt;
&lt;LI&gt;Visual Studio Code&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;VS Code extensions are built on Node.js and TypeScript/JavaScript, so Node.js is mandatory.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Step 2: Install Yeoman and the VS Code Extension Generator&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Microsoft provides an official Yeoman generator to scaffold VS Code extensions quickly.&lt;/P&gt;
&lt;P&gt;Run the following command in your terminal:&lt;/P&gt;
&lt;LI-CODE lang="shell"&gt;npm install -g yo generator-code
&lt;/LI-CODE&gt;
&lt;P&gt;This installs:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Yeoman (yo) – a project scaffolding tool&lt;/LI&gt;
&lt;LI&gt;generator-code – the VS Code extension generator.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;&lt;STRONG&gt;Step 3: Scaffold a New Extension&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Create a new extension project by running:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;yo code
&lt;/LI-CODE&gt;
&lt;P&gt;You’ll be prompted with a few questions:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Type of extension → New Extension (TypeScript or JavaScript)&lt;/LI&gt;
&lt;LI&gt;Extension name&lt;/LI&gt;
&lt;LI&gt;Identifier&lt;/LI&gt;
&lt;LI&gt;Description&lt;/LI&gt;
&lt;LI&gt;Package manager (npm recommended)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;After answering these, Yeoman will generate a ready-to-use extension project structure.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Step 4: Understand the Project Structure&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;A typical generated extension contains:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;package.json – Extension metadata, commands, and contributions&lt;/LI&gt;
&lt;LI&gt;src/extension.ts or extension.js – Main extension logic&lt;/LI&gt;
&lt;LI&gt;.vscode/launch.json – Debug configuration&lt;/LI&gt;
&lt;LI&gt;README.md – Documentation&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The package.json file tells VS Code:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;What your extension contributes (commands, menus, settings)&lt;/LI&gt;
&lt;LI&gt;Which VS Code versions it supports&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The extension.ts file contains the code that runs when your extension is activated.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Step 5: Run and Test the Extension&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;To test your extension:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Open the extension project in VS Code&lt;/LI&gt;
&lt;LI&gt;Press F5 (Start Debugging)&lt;/LI&gt;
&lt;LI&gt;A new Extension Development Host window opens&lt;/LI&gt;
&lt;LI&gt;Open the Command Palette (Ctrl+Shift+P)&lt;/LI&gt;
&lt;LI&gt;Run your extension command (e.g., Hello World)&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;You should see a notification message, confirming your extension is working.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Step 6: Modify the Extension&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;You can now customize the extension behavior by editing extension.ts:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Change the message shown to users&lt;/LI&gt;
&lt;LI&gt;Register new commands&lt;/LI&gt;
&lt;LI&gt;Use VS Code APIs (notifications, input boxes, file access)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;After making changes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Reload the Extension Development Host&lt;/LI&gt;
&lt;LI&gt;Re-run the command to see updates&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;VS Code provides built-in debugging tools to set breakpoints and inspect variables during extension execution.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Step 7: Package and Publish (Optional)&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;To publish your extension to the VS Code Marketplace:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Install the VS Code Extension CLI: &amp;nbsp;&lt;LI-CODE lang="shell"&gt;npm install -g vsce
&lt;/LI-CODE&gt;&lt;/LI&gt;
&lt;LI&gt;Create a publisher account&lt;/LI&gt;
&lt;LI&gt;Package and publish the extension:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang="shell"&gt;vsce package
vsce publish
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;Use Case&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P&gt;In a recent enterprise infrastructure initiative, there was a recurring need to generate Terraform code for CPF modules in strict alignment with project reference guidelines and approved module templates. Although these modules were centrally maintained in a project repository, manual consumption required engineers to repeatedly search across repositories, interpret module definitions, and assemble boilerplate code. To reduce this overhead and improve consistency, a custom Visual Studio Code extension was implemented to automate Terraform scaffolding while ensuring the generated output remained compliant with project standards.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A key constraint for this solution was that MCP (or any managed AI orchestration platform) could not be used. Accordingly, the design was implemented entirely within the VS Code extension boundary, with deterministic control points to preserve auditability. The extension integrates with the repository through its APIs to fetch the latest module templates and then extracts relevant module metadata—such as variables, outputs, and structural requirements - to drive generation decisions. This ensured the extension was not dependent on static local templates and could remain aligned with repository-driven evolution of CPF modules.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For the Terraform code generator portion specifically, GitHub Copilot was incorporated to assist with producing the final Terraform configuration content efficiently. In this model, Copilot supports rapid iteration and contextual code generation within the developer workflow, while the VS Code extension continues to act as the governing layer that constrains and validates what gets generated (for example, enforcing module selection rules, naming conventions, and approved file structure). This mirrors the broader pattern where Copilot enhances developer experience and productivity in the IDE without replacing deterministic guardrails.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The overall result is an editor-native workflow that balances productivity and compliance: repository APIs provide the authoritative source of module templates; deterministic parsing and guideline enforcement provide consistency and repeatability; and GitHub Copilot accelerates the code authoring experience for Terraform files. This demonstrates that meaningful Infrastructure-as-Code automation can be delivered under strict platform constraints, while still leveraging AI-assisted development responsibly within controlled boundaries.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;End to end Flow&lt;/STRONG&gt;&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;Start
&lt;UL&gt;
&lt;LI&gt;User launches the VS Code extension.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;
&lt;OL&gt;
&lt;LI&gt;Resource &amp;amp; Source Selection
&lt;UL&gt;
&lt;LI&gt;User selects resource types and chooses a source (GitLab or JFrog).&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;
&lt;OL&gt;
&lt;LI&gt;Source Choice Branching
&lt;UL&gt;
&lt;LI&gt;GitLab Path:
&lt;UL&gt;
&lt;LI&gt;Fetch projects&lt;/LI&gt;
&lt;LI&gt;Filter and rank modules&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;JFrog Path:
&lt;UL&gt;
&lt;LI&gt;Fetch artifacts&lt;/LI&gt;
&lt;LI&gt;Filter and rank modules&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;&lt;img /&gt;
&lt;OL&gt;
&lt;LI&gt;Ranked Module List
&lt;UL&gt;
&lt;LI&gt;Consolidated ranked modules displayed to the user.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;User Selection
&lt;UL&gt;
&lt;LI&gt;User selects modules to deploy.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;
&lt;OL&gt;
&lt;LI&gt;Download/Clone Modules
&lt;UL&gt;
&lt;LI&gt;Clone Git repositories or download artifacts.&lt;/LI&gt;
&lt;LI&gt;Extract to workspace.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;
&lt;OL&gt;
&lt;LI&gt;Terraform Parser
&lt;UL&gt;
&lt;LI&gt;Parse .tf files for:
&lt;UL&gt;
&lt;LI&gt;locals&lt;/LI&gt;
&lt;LI&gt;module calls&lt;/LI&gt;
&lt;LI&gt;required_providers&lt;/LI&gt;
&lt;LI&gt;provider blocks&lt;/LI&gt;
&lt;LI&gt;backend configuration&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Metadata Assembly
&lt;UL&gt;
&lt;LI&gt;Aggregate:
&lt;UL&gt;
&lt;LI&gt;module_info&lt;/LI&gt;
&lt;LI&gt;example_values (locals, module_calls, providers, backend)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Output Generation
&lt;UL&gt;
&lt;LI&gt;Save Module JSON:
&lt;UL&gt;
&lt;LI&gt;Separate files for GitLab and JFrog.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Generate Prompt:
&lt;UL&gt;
&lt;LI&gt;Initialize prompt-for-Iac.md&lt;/LI&gt;
&lt;LI&gt;Append module metadata.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;
&lt;OL&gt;
&lt;LI&gt;AI-Assisted IaC Generation
&lt;UL&gt;
&lt;LI&gt;Use prompt to generate Terraform files:
&lt;UL&gt;
&lt;LI&gt;main.tf&lt;/LI&gt;
&lt;LI&gt;providers.tf&lt;/LI&gt;
&lt;LI&gt;variables.tf&lt;/LI&gt;
&lt;LI&gt;outputs.tf&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Deployable Terraform Code
&lt;UL&gt;
&lt;LI&gt;Ready for:
&lt;UL&gt;
&lt;LI&gt;terraform init&lt;/LI&gt;
&lt;LI&gt;terraform plan&lt;/LI&gt;
&lt;LI&gt;terraform apply&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;
&lt;OL&gt;
&lt;LI&gt;End
&lt;UL&gt;
&lt;LI&gt;User to review that code and replace the values (if required).&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 31 Mar 2026 03:00:06 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/vs-code-extension/ba-p/4500803</guid>
      <dc:creator>Shikhaghildiyal</dc:creator>
      <dc:date>2026-03-31T03:00:06Z</dc:date>
    </item>
    <item>
      <title>CI/CD as a Platform: Shipping Microservices and AI Agents with Reusable GitHub Actions Workflows</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/ci-cd-as-a-platform-shipping-microservices-and-ai-agents-with/ba-p/4504550</link>
      <description>&lt;H2 data-streamdown="heading-2"&gt;The First Shift — Treating CI/CD as a Platform&lt;/H2&gt;
&lt;P&gt;The first insight is straightforward but underused:&lt;/P&gt;
&lt;P&gt;Your CI/CD logic is infrastructure. It deserves the same design discipline as your application code.&lt;/P&gt;
&lt;P&gt;That means centralizing it. Versioning it. Exposing it as reusable, callable workflows — not copy-pasted YAML scattered across dozens of repos.&lt;/P&gt;
&lt;P&gt;In Part 1 of this series, we build exactly that. A &lt;SPAN data-streamdown="strong"&gt;platform repository&lt;/SPAN&gt; that defines reusable GitHub Actions workflows for testing, building, and deploying containerized services to Azure. Application repos stay thin — they simply call the platform, like invoking an API.&lt;/P&gt;
&lt;P&gt;Build once. Deploy anywhere. Fix once. Every team benefits.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;The Second Shift — Governing AI Behavior&lt;/H2&gt;
&lt;P&gt;But software is changing.&lt;/P&gt;
&lt;P&gt;We are no longer just shipping APIs and microservices. We are shipping &lt;SPAN data-streamdown="strong"&gt;AI agents&lt;/SPAN&gt; — systems that reason, respond, and make decisions. And these systems break the assumptions that traditional CI/CD was built on.&lt;/P&gt;
&lt;P&gt;A unit test can tell you whether your code is &lt;EM&gt;correct&lt;/EM&gt;. It cannot tell you whether your AI agent is &lt;EM&gt;trustworthy&lt;/EM&gt;. Prompts behave like code but drift differently. Model outputs are probabilistic. Quality degrades silently, without a failed test to catch it.&lt;/P&gt;
&lt;P&gt;This creates a new engineering challenge:&lt;/P&gt;
&lt;P&gt;How do you build a delivery pipeline for something that does not have a deterministic right answer?&lt;/P&gt;
&lt;P&gt;In Part 2, we extend the platform to answer that question. We introduce &lt;SPAN data-streamdown="strong"&gt;evaluation as a deployment gate&lt;/SPAN&gt; — a reusable workflow that scores agent behavior before any deployment is allowed. We integrate with &lt;SPAN data-streamdown="strong"&gt;Microsoft Foundry&lt;/SPAN&gt; for agent runtime and observability. And we show how the same platform-thinking from Part 1 applies directly to AI systems.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;What This Series Is Really About&lt;/H2&gt;
&lt;P&gt;This is not a tutorial on GitHub Actions syntax.&lt;/P&gt;
&lt;P&gt;It is about &lt;SPAN data-streamdown="strong"&gt;maturity&lt;/SPAN&gt; — the difference between a team that writes pipelines and a team that &lt;EM&gt;designs delivery systems&lt;/EM&gt;. Between an organization that ships code and one that &lt;SPAN data-streamdown="strong"&gt;governs behavior&lt;/SPAN&gt;.&lt;/P&gt;
&lt;P&gt;By the end of both parts, you will have:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;A reusable CI/CD platform that scales across any number of services&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;An evaluation-driven delivery pipeline for AI agents&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;A mental model for treating both code and AI as governed, versioned artifacts&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The tools are GitHub Actions and Azure. The principle is platform thinking.&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Let's build it.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;The Problem — Why CI/CD Pipelines Don't Scale&lt;/H2&gt;
&lt;P&gt;Every pipeline starts simple.&lt;/P&gt;
&lt;P&gt;You create a repository, add a workflow file, and within minutes your code is building and deploying automatically. It feels like a solved problem.&lt;/P&gt;
&lt;P&gt;It isn't.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Reality of Growth&lt;/H3&gt;
&lt;P&gt;The first pipeline is straightforward. The second is a copy of the first. The third is a copy of the second — with one small adjustment. By the time you have ten services, you have ten slightly different pipelines, each one drifting quietly away from the others.&lt;/P&gt;
&lt;P&gt;This is &lt;SPAN data-streamdown="strong"&gt;pipeline sprawl&lt;/SPAN&gt; — and it is far more costly than it appears.&lt;/P&gt;
&lt;P&gt;Consider what happens in practice:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;One team upgrades their Python version. Others don't.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;A security fix gets applied to three pipelines. The other seven are missed.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;A new compliance requirement means updating every workflow file — manually, one repo at a time.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;A new engineer onboards using an old workflow and ships a pattern that was deprecated months ago.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;None of this feels critical in the moment. But over time, your CI/CD layer becomes the most &lt;SPAN data-streamdown="strong"&gt;inconsistent, unmaintainable, and ungoverned&lt;/SPAN&gt; part of your infrastructure — even though it controls everything that ships to production.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Deeper Problem — No Separation of Concerns&lt;/H3&gt;
&lt;P&gt;The root cause is not a tooling limitation. It is a &lt;SPAN data-streamdown="strong"&gt;design problem.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Most teams treat CI/CD as something that lives &lt;EM&gt;inside&lt;/EM&gt; an application repo — a secondary concern, not a first-class system. That model works at small scale. It breaks at org scale.&lt;/P&gt;
&lt;P&gt;When CI/CD logic is distributed across every application repo:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;There is no single source of truth for how deployments work&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Platform teams cannot enforce standards without touching every repo individually&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Security and compliance teams have no centralized control plane&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Onboarding a new service means rebuilding from scratch — or copying from an outdated reference&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Cost You Don't See&lt;/H3&gt;
&lt;P&gt;The real cost of this pattern is not the duplicated YAML. It is the &lt;SPAN data-streamdown="strong"&gt;compounding overhead&lt;/SPAN&gt;:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Problem&lt;/th&gt;&lt;th&gt;Visible Cost&lt;/th&gt;&lt;th&gt;Hidden Cost&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Duplicated pipelines&lt;/td&gt;&lt;td&gt;Time to replicate&lt;/td&gt;&lt;td&gt;Drift and inconsistency over time&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;No centralized logic&lt;/td&gt;&lt;td&gt;Minor friction&lt;/td&gt;&lt;td&gt;Security gaps across repos&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Manual updates&lt;/td&gt;&lt;td&gt;One-time effort per change&lt;/td&gt;&lt;td&gt;Multiplied across every service&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;No versioning&lt;/td&gt;&lt;td&gt;Manageable today&lt;/td&gt;&lt;td&gt;Breaking changes with no rollback path&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-streamdown="heading-3"&gt;What the Solution Looks Like&lt;/H3&gt;
&lt;P&gt;The answer is not a better YAML template.&lt;/P&gt;
&lt;P&gt;It is a &lt;SPAN data-streamdown="strong"&gt;platform.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Specifically — a centralized repository that owns CI/CD logic, exposes it as reusable versioned workflows, and lets every application team consume it without duplicating a single line of pipeline code.&lt;/P&gt;
&lt;P&gt;This is the same principle that drives every mature engineering organization:&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Don't repeat infrastructure. Abstract it. Version it. Share it.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;That is exactly what we are going to build.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;The Architecture — What You're Building&lt;/H2&gt;
&lt;P&gt;Before writing a single line of code, it is worth understanding the system as a whole.&lt;/P&gt;
&lt;P&gt;The architecture is intentionally simple. Two repositories. One cloud infrastructure. One clear separation of responsibilities.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Two-Repo Model&lt;/H3&gt;
&lt;img /&gt;
&lt;P&gt;This separation is the core design decision. Everything else follows from it.&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;The platform repo&lt;/SPAN&gt; is not an application. It does not ship features. It ships &lt;SPAN data-streamdown="strong"&gt;workflow infrastructure&lt;/SPAN&gt; — reusable, versioned, callable by any application team in your organization.&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;The application repo&lt;/SPAN&gt; is deliberately thin on CI/CD. It contains a single workflow file that calls the platform. Nothing more.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;How They Connect&lt;/H3&gt;
&lt;P&gt;The connection happens through GitHub's workflow_call trigger — a mechanism that allows one workflow to invoke another across repositories.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The application repo does not care &lt;EM&gt;how&lt;/EM&gt; the build works. It only cares about the &lt;EM&gt;contract&lt;/EM&gt; — inputs it needs to provide, outputs it can expect back.&lt;/P&gt;
&lt;P&gt;This is the same mental model as an API:&lt;/P&gt;
&lt;P&gt;The caller knows the interface. The platform owns the implementation.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Deployment Flow&lt;/H3&gt;
&lt;P&gt;Once triggered, the pipeline moves through four clearly defined stages:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;A few things to note about this flow:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;The image is built exactly once.&lt;/SPAN&gt; The same artifact moves through every environment — no rebuilds, no drift.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;The Git SHA is the image tag.&lt;/SPAN&gt; Every deployment is fully traceable back to a specific commit.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;GitHub Environments control approvals.&lt;/SPAN&gt; Staging and production are separate environments with configurable protection rules — no custom approval logic needed.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Azure Infrastructure&lt;/H3&gt;
&lt;P&gt;On the cloud side, the system uses two Azure services:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Service&lt;/th&gt;&lt;th&gt;Role&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Azure Container Registry (ACR)&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Stores Docker images&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Azure Container Apps&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Runs the application in staging and production&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Both are provisioned using &lt;SPAN data-streamdown="strong"&gt;Bicep&lt;/SPAN&gt; — Azure's infrastructure-as-code language — so the infrastructure is versioned and repeatable alongside the workflows.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;Responsibility Map&lt;/H3&gt;
&lt;P&gt;Here is how responsibilities are distributed across the system:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Layer&lt;/th&gt;&lt;th&gt;Owns&lt;/th&gt;&lt;th&gt;Does Not Own&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Platform Repo&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Test logic, build logic, deploy logic&lt;/td&gt;&lt;td&gt;Application code&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Application Repo&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Business logic, Dockerfile, requirements&lt;/td&gt;&lt;td&gt;Pipeline implementation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Azure&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Runtime, registry, networking&lt;/td&gt;&lt;td&gt;Deployment decisions&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;This clean separation means:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Platform teams can update CI/CD logic without touching application code&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Application teams can ship features without understanding pipeline internals&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Infrastructure changes are isolated to the Bicep layer&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;Why This Scales&lt;/H3&gt;
&lt;P&gt;The real power of this architecture becomes clear at scale.&lt;/P&gt;
&lt;P&gt;With fifty microservices:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;One change to deploy.yml in the platform repo propagates to every service on the next run. No manual updates. No drift. No inconsistency.&lt;/P&gt;
&lt;P&gt;This is what &lt;SPAN data-streamdown="strong"&gt;CI/CD as a platform&lt;/SPAN&gt; means in practice.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;Platform Repo — Structure and Reusable Workflows&lt;/H2&gt;
&lt;P&gt;The platform repo is the heart of this system. Everything it contains is designed to be &lt;SPAN data-streamdown="strong"&gt;reusable, versioned, and consumed by any application team&lt;/SPAN&gt; in your organization.&lt;/P&gt;
&lt;P&gt;Let's walk through it in full.&lt;/P&gt;
&lt;H2&gt;Repository Structure&lt;/H2&gt;
&lt;img /&gt;
&lt;P&gt;Three workflows. One infrastructure file. That is the entire platform.&lt;/P&gt;
&lt;P&gt;Each workflow has a single, well-defined responsibility:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Workflow&lt;/th&gt;&lt;th&gt;Responsibility&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;test-python.yml&lt;/td&gt;&lt;td&gt;Install dependencies and run tests&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;build.yml&lt;/td&gt;&lt;td&gt;Build Docker image and push to ACR&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;deploy.yml&lt;/td&gt;&lt;td&gt;Deploy a specific image to a specific environment&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-streamdown="heading-3"&gt;Workflow 1 — test-python.yml&lt;/H3&gt;
&lt;P&gt;This workflow handles dependency installation and test execution for any Python-based service.&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;name: test-python

on:
  workflow_call:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11.9"

      - run: pip install -r requirements.txt
      - run: pytest
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;What to note:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;The on: workflow_call trigger is what makes this reusable. It cannot be triggered directly — it must be called by another workflow.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;The Python version is &lt;SPAN data-streamdown="strong"&gt;pinned to 3.11.9&lt;/SPAN&gt; — not a floating version like 3.11. This ensures every service tests against the exact same runtime, eliminating environment-specific failures.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Any application repo that calls this workflow gets consistent, centrally maintained test execution — without defining any of this logic themselves.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;Workflow 2 — build.yml&lt;/H3&gt;
&lt;P&gt;This workflow builds the Docker image, tags it with the Git SHA, and pushes it to Azure Container Registry.&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;name: build

on:
  workflow_call:
    outputs:
      image_tag:
        value: ${{ jobs.build.outputs.image_tag }}

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image_tag: ${{ steps.meta.outputs.tag }}

    permissions:
      id-token: write
      contents: read

    steps:
      - uses: actions/checkout@v4

      - id: meta
        run: echo "tag=${GITHUB_SHA}" &amp;gt;&amp;gt; $GITHUB_OUTPUT

      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - run: az acr login --name ${{ secrets.ACR_NAME }}

      - run: |
          docker build -t ${{ secrets.ACR_LOGIN_SERVER }}/app:${{ github.sha }} .
          docker push ${{ secrets.ACR_LOGIN_SERVER }}/app:${{ github.sha }}
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;What to note:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;outputs&lt;/SPAN&gt; — This workflow exposes image_tag as an output. The calling workflow captures this value and passes it downstream to the deploy workflow. This is how the same image tag flows from build → staging → production without being hardcoded anywhere.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;id-token: write&lt;/SPAN&gt; — This permission enables &lt;SPAN data-streamdown="strong"&gt;OIDC-based authentication&lt;/SPAN&gt; with Azure. No long-lived credentials are stored as secrets. GitHub generates a short-lived token at runtime, which Azure trusts via a federated identity configuration. This is the recommended authentication pattern for production workloads.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;${GITHUB_SHA}&lt;/SPAN&gt; — Using the commit SHA as the image tag makes every build fully traceable. Given any running container, you can identify the exact commit it was built from.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;Workflow 3 — deploy.yml&lt;/H3&gt;
&lt;P&gt;This workflow deploys a given image to a given environment in Azure Container Apps.&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;name: deploy

on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string
      image_tag:
        required: true
        type: string
      app_name:
        required: true
        type: string

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}

    steps:
      - uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - run: |
          az containerapp update \
            --name ${{ inputs.app_name }} \
            --resource-group ${{ secrets.AZURE_RESOURCE_GROUP }} \
            --image ${{ secrets.ACR_LOGIN_SERVER }}/app:${{ inputs.image_tag }}
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;What to note:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Three inputs&lt;/SPAN&gt; — environment, image_tag, and app_name. This single workflow handles &lt;EM&gt;every&lt;/EM&gt; environment. The caller decides where to deploy by passing inputs — the workflow itself has no hardcoded environment logic.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;environment: ${{ inputs.environment }}&lt;/SPAN&gt; — This line is deceptively powerful. By mapping the job's environment to the input value, GitHub automatically applies whatever protection rules are configured for that environment — required reviewers, wait timers, deployment policies. Approval gates come for free.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;secrets: inherit&lt;/SPAN&gt; — When the calling workflow passes secrets: inherit, Azure credentials flow through automatically without being re-declared. Secrets are managed once, at the org or repo level.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Versioning Contract&lt;/H3&gt;
&lt;P&gt;One detail that makes this system production-ready is &lt;SPAN data-streamdown="strong"&gt;workflow versioning&lt;/SPAN&gt;.&lt;/P&gt;
&lt;P&gt;When an application repo calls a platform workflow, it references a specific version:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The &lt;a href="javascript:void(0)" data-lia-user-mentions="" data-lia-user-uid="3013107" data-lia-user-login="v1" class="lia-mention lia-mention-user"&gt;v1​&lt;/a&gt; tag means:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Application teams are insulated from breaking changes in the platform&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Platform teams can ship improvements without forcing immediate upgrades&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;You can run &lt;a href="javascript:void(0)" data-lia-user-mentions="" data-lia-user-uid="3013107" data-lia-user-login="v1" class="lia-mention lia-mention-user"&gt;v1​&lt;/a&gt; and @v2 side by side during migrations&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Every deployment is traceable to a specific platform version&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This versioning model is what separates a &lt;SPAN data-streamdown="strong"&gt;platform&lt;/SPAN&gt; from a shared folder of YAML files.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;What Application Teams See&lt;/H3&gt;
&lt;P&gt;From an application team's perspective, the entire platform surface looks like this:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Three uses statements. That is the entire CI/CD surface an application team needs to understand.&lt;/P&gt;
&lt;P&gt;Everything else — authentication, image tagging, registry login, container update commands — is abstracted away inside the platform.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;Azure Infrastructure&lt;/H2&gt;
&lt;P&gt;The platform workflows handle CI/CD logic. The Azure infrastructure handles the runtime — where your containers live, how they are stored, and how they are served to the outside world.&lt;/P&gt;
&lt;P&gt;All infrastructure is defined in&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;Bicep&lt;/SPAN&gt;&amp;nbsp;— Azure's native infrastructure-as-code language. This means your infrastructure is versioned, repeatable, and deployable from a single command.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;Why Bicep&lt;/H3&gt;
&lt;P&gt;Before diving into the code, it is worth briefly explaining the choice.&lt;/P&gt;
&lt;P&gt;Bicep compiles down to ARM templates but is significantly more readable. It integrates natively with Azure's resource model, requires no external state management, and fits naturally alongside GitHub Actions workflows.&lt;/P&gt;
&lt;P&gt;For teams already working within the Azure ecosystem, it is the most straightforward path to infrastructure-as-code without introducing additional tooling dependencies.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;Infrastructure Structure&lt;/H3&gt;
&lt;img /&gt;
&lt;P&gt;The entire infrastructure is defined in a single file. For this architecture, you need two resources:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Resource&lt;/th&gt;&lt;th&gt;Purpose&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Azure Container Registry (ACR)&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Stores and serves Docker images&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Azure Container Apps&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Runs containers in a managed serverless environment&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-streamdown="heading-3"&gt;main.bicep&lt;/H3&gt;
&lt;LI-CODE lang="bicep"&gt;param location string = resourceGroup().location

// Azure Container Registry
resource acr 'Microsoft.ContainerRegistry/registries@2023-01-01-preview' = {
  name: 'myregistry'
  location: location
  sku: { name: 'Basic' }
}

// Azure Container App (Staging + Production)
resource containerApp 'Microsoft.App/containerApps@2023-05-01' = {
  name: 'my-app'
  location: location
  properties: {
    configuration: {
      ingress: {
        external: true
        targetPort: 8000
      }
    }
  }
}
&lt;/LI-CODE&gt;
&lt;H3 data-streamdown="heading-3"&gt;Breaking It Down&lt;/H3&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Container Registry&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="bicep"&gt;resource acr 'Microsoft.ContainerRegistry/registries@2023-01-01-preview' = {
  name: 'myregistry'
  location: location
  sku: { name: 'Basic' }
}&lt;/LI-CODE&gt;
&lt;P&gt;The ACR is the central image store for your entire platform. Every image built by build.yml is pushed here, tagged with its Git SHA. Both staging and production pull from this registry — ensuring the exact same artifact runs in both environments.&lt;/P&gt;
&lt;P&gt;The Basic SKU is sufficient for most team-scale workloads. For larger organizations with higher throughput requirements, Standard or Premium SKUs offer geo-replication and increased storage limits.&lt;/P&gt;
&lt;P&gt;Container App&lt;/P&gt;
&lt;LI-CODE lang="bicep"&gt;resource containerApp 'Microsoft.App/containerApps@2023-05-01' = {
  name: 'my-app'
  location: location
  properties: {
    configuration: {
      ingress: {
        external: true
        targetPort: 8000
      }
    }
  }
}&lt;/LI-CODE&gt;
&lt;P&gt;Azure Container Apps provides a fully managed serverless container runtime. You define what runs — it handles scaling, networking, and availability.&lt;/P&gt;
&lt;P&gt;Two things to note here:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;external: true&lt;/SPAN&gt;&amp;nbsp;— Makes the application publicly accessible over HTTPS. Azure Container Apps automatically provisions a fully qualified domain name and TLS certificate.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;targetPort: 8000&lt;/SPAN&gt;&amp;nbsp;— Maps to the port exposed by the FastAPI application inside the container. This must match the&amp;nbsp;--port&amp;nbsp;argument in your&amp;nbsp;CMD&amp;nbsp;instruction in the Dockerfile.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;Staging vs. Production&lt;/H3&gt;
&lt;P&gt;You will deploy this infrastructure twice — once for staging, once for production — with different resource names:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# Deploy staging
az deployment group create \
-- resource-group rg-ciplatform-staging \
-- template-file infra/main.bicep
# Deploy production
az deployment group create
-- resource-group rg-ciplatform-production \
-- template-file infra/main.bicep&lt;/LI-CODE&gt;
&lt;P&gt;The deploy.yml workflow then targets the correct app by name via the app_name input:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;This keeps staging and production fully isolated at the infrastructure level while sharing the same workflow logic.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;GitHub Environments and Approval Gates&lt;/H3&gt;
&lt;P&gt;On the GitHub side, you configure two&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;Environments&lt;/SPAN&gt;&amp;nbsp;—&amp;nbsp;staging&amp;nbsp;and&amp;nbsp;production&amp;nbsp;— inside your repository settings.&lt;/P&gt;
&lt;P&gt;For production, add a&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;required reviewer&lt;/SPAN&gt; protection rule:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;When the pipeline reaches the&amp;nbsp;deploy-prod&amp;nbsp;job, GitHub will pause and wait for a designated reviewer to approve before proceeding. This approval gate costs nothing extra — it is built into GitHub's environment model and wired automatically through the&amp;nbsp;environment:&amp;nbsp;field in&amp;nbsp;deploy.yml.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;Setting Up Azure Authentication&lt;/H3&gt;
&lt;P&gt;The workflows authenticate to Azure using&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;OpenID Connect (OIDC)&lt;/SPAN&gt;&amp;nbsp;— a keyless authentication method that eliminates the need for long-lived service principal secrets.&lt;/P&gt;
&lt;P&gt;Set up the federated identity once:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;# Create a service principal
az ad app create -- display-name "github-actions-platform"

# Add federated credential for your repo
az ad app federated-credential create \
-- id &amp;lt;app-id&amp;gt; \
-- parameters '{
"name": "github-actions",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:your-org/fastapi-app:ref:refs/heads/main",
"audiences": ["api://AzureADTokenExchange"]
}'&lt;/LI-CODE&gt;
&lt;P&gt;Then add these three secrets to your GitHub repository:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Secret&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;AZURE_CLIENT_ID&lt;/td&gt;&lt;td&gt;Application (client) ID&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AZURE_TENANT_ID&lt;/td&gt;&lt;td&gt;Directory (tenant) ID&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AZURE_SUBSCRIPTION_ID&lt;/td&gt;&lt;td&gt;Azure subscription ID&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AZURE_RESOURCE_GROUP&lt;/td&gt;&lt;td&gt;Target resource group name&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ACR_NAME&lt;/td&gt;&lt;td&gt;Container registry name&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ACR_LOGIN_SERVER&lt;/td&gt;&lt;td&gt;Registry login server (e.g.&amp;nbsp;myregistry.azurecr.io)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;With these in place, every workflow that calls azure/login@v2 authenticates automatically — no passwords, no rotation, no expiry management.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;Application Repo — Structure, Code, and Release Workflow&lt;/H2&gt;
&lt;P&gt;With the platform repo in place, the application repo becomes remarkably simple. Its only CI/CD responsibility is to call the platform — everything else is focused purely on application logic.&lt;/P&gt;
&lt;P&gt;This is the goal: &lt;SPAN data-streamdown="strong"&gt;application teams ship features, not pipelines.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;Repository Structure&lt;/H3&gt;
&lt;img /&gt;
&lt;P&gt;This is the entire CI/CD footprint of the application repo.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Application — src/main.py&lt;/H3&gt;
&lt;P&gt;The application is a minimal FastAPI service with a single endpoint that returns the current deployed version and environment.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from fastapi import FastAPI
import os

app = FastAPI()

@app.get("/version")
def version():
    return {
        "version": os.getenv("GITHUB_SHA", "dev"),
        "environment": os.getenv("APP_ENV", "local")
    }
&lt;/LI-CODE&gt;
&lt;P&gt;This endpoint serves a practical purpose beyond demonstration. In a real system, a /version or /health endpoint like this allows you to:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Verify which commit is running in each environment&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Confirm a deployment succeeded without inspecting container logs&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Detect environment mismatches between staging and production&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;requirements.txt&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;All dependencies are &lt;SPAN data-streamdown="strong"&gt;pinned to exact versions&lt;/SPAN&gt;. This ensures the same packages install in every environment — local development, CI, staging, and production — eliminating version drift as a source of failures.&lt;/P&gt;
&lt;P&gt;Dockerfile&lt;/P&gt;
&lt;LI-CODE lang="docker"&gt;FROM python:3.11.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY src ./src

CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;What to note:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;python:3.11.9-slim&lt;/SPAN&gt; — The base image uses the same Python version as the platform's test-python.yml workflow. Consistency between the test environment and the container runtime eliminates an entire class of environment-specific bugs.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Dependency layer first&lt;/SPAN&gt; — requirements.txt is copied and installed before application source code. This is a deliberate layer ordering decision — Docker caches the dependency layer independently, so subsequent builds only reinstall packages when requirements.txt changes, not on every code change.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;0.0.0.0&lt;/SPAN&gt; — Binds the server to all network interfaces inside the container, making it reachable from outside. Combined with targetPort: 8000 in the Bicep configuration, this completes the network path from Azure Container Apps to the application.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Release Workflow — release.yml&lt;/H3&gt;
&lt;P&gt;This is the most important file in the application repo. It is also the simplest.&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;name: release

on:
  push:
    branches: [main]

permissions:
  id-token: write
  contents: read

jobs:
  test:
    uses: ns-github-design/ci-platform/.github/workflows/test-python.yml@v1

  build:
    needs: test
    uses: ns-github-design/ci-platform/.github/workflows/build.yml@v1
    secrets: inherit

  deploy-staging:
    needs: build
    uses: ns-github-design/ci-platform/.github/workflows/deploy.yml@v1
    with:
      environment: staging
      image_tag: ${{ needs.build.outputs.image_tag }}
      app_name: my-app-staging
    secrets: inherit

  deploy-prod:
    needs: [build, deploy-staging]
    uses: ns-github-design/ci-platform/.github/workflows/deploy.yml@v1
    with:
      environment: production
      image_tag: ${{ needs.build.outputs.image_tag }}
      app_name: my-app-prod
    secrets: inherit
&lt;/LI-CODE&gt;
&lt;H3 data-streamdown="heading-3"&gt;Walking Through the Pipeline&lt;/H3&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Trigger&lt;/SPAN&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Every merge to main triggers a full release. This reflects a &lt;SPAN data-streamdown="strong"&gt;trunk-based delivery model&lt;/SPAN&gt; — main is always releasable, and every commit to it initiates the path to production.&lt;/P&gt;
&lt;P&gt;Test Job&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The first job calls the platform's test workflow. No configuration required — the platform handles Python setup, dependency installation, and test execution. The application team owns the test files; the platform owns the execution environment.&lt;/P&gt;
&lt;P&gt;Build Job&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The build job runs only after tests pass. It calls the platform's build workflow and inherits all secrets automatically — Azure credentials, ACR login server, registry name — without re-declaring them.&lt;/P&gt;
&lt;P&gt;The critical output here is image_tag — the Git SHA of the current commit. This value is captured and passed downstream to both deploy jobs.&lt;/P&gt;
&lt;P&gt;Deploy to Staging&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The staging deployment runs immediately after a successful build. It passes three inputs to the deploy workflow:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;environment: staging — triggers GitHub's staging environment rules&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;image_tag — the exact SHA built in the previous job&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;app_name: my-app-staging — the target Container App in Azure&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Deploy to Production&lt;/SPAN&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Production deployment runs only after staging succeeds. It uses the &lt;SPAN data-streamdown="strong"&gt;same image_tag&lt;/SPAN&gt; — the identical image that just ran successfully in staging is what gets promoted to production. No rebuild. No repackaging. The artifact is immutable.&lt;/P&gt;
&lt;P&gt;If a required reviewer is configured on the production GitHub Environment, the pipeline pauses here until approval is granted.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;The Complete Pipeline at a Glance&lt;/H3&gt;
&lt;img /&gt;
&lt;H3 data-streamdown="heading-3"&gt;What the Application Team Never Has to Think About&lt;/H3&gt;
&lt;P&gt;It is worth being explicit about what this model abstracts away from application engineers:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Concern&lt;/th&gt;&lt;th&gt;Handled By&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Azure authentication&lt;/td&gt;&lt;td&gt;Platform (build.yml, deploy.yml)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Docker build and push&lt;/td&gt;&lt;td&gt;Platform (build.yml)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Image tagging strategy&lt;/td&gt;&lt;td&gt;Platform (build.yml)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Container App update command&lt;/td&gt;&lt;td&gt;Platform (deploy.yml)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Approval gate mechanics&lt;/td&gt;&lt;td&gt;GitHub Environments&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Python version consistency&lt;/td&gt;&lt;td&gt;Platform (test-python.yml)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;The application team's CI/CD knowledge requirement is reduced to understanding three uses statements and two with input blocks. Everything else is the platform's responsibility.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;&lt;SPAN data-streamdown="strong"&gt;Demo — Proving It Works&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;Your pipeline is now live and connected across three layers:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;GitHub&amp;nbsp;Actions (Reusable&amp;nbsp;Workflows)&lt;/SPAN&gt;&amp;nbsp;– powering CI/CD logic&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;FastAPI Application Repo&lt;/SPAN&gt;&amp;nbsp;– consuming those workflows&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Azure&amp;nbsp;Container&amp;nbsp;Apps&lt;/SPAN&gt; – running staging and production&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Step&amp;nbsp;1&amp;nbsp;–&amp;nbsp;Trigger the CI/CD Pipeline&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Push any commit to the main branch:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Then open:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;You’ll see the workflow&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;release&lt;/SPAN&gt; start automatically.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Step&amp;nbsp;2&amp;nbsp;–&amp;nbsp;Observe the&amp;nbsp;Pipeline Run&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;The jobs execute in sequence:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Stage&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;test&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Runs&amp;nbsp;pytest&amp;nbsp;inside GitHub Actions using the reusable workflow&amp;nbsp;test-python.yml&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;build&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Builds and tags a Docker&amp;nbsp;image with the current Git&amp;nbsp;SHA, then pushes to ACR&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;deploy‑staging&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Deploys that same image to your Container&amp;nbsp;App&amp;nbsp;my-app-staging&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;approval gate&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Waits for approval of the&amp;nbsp;production&amp;nbsp;environment&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;deploy‑prod&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;On approval, promotes the identical image to my-app-prod&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Your final dependency chain looks like this:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;(You added&amp;nbsp;needs: [build, deploy-staging]—perfect for ensuring the correct ordering.)&lt;/EM&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Step 3 – Review the Logs&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Every job’s output is visible inside GitHub&amp;nbsp;Actions:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;test&lt;/SPAN&gt;&amp;nbsp;– confirms tests collected successfully&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;build&lt;/SPAN&gt;&amp;nbsp;– shows&amp;nbsp;docker push&amp;nbsp;...&amp;nbsp;to ACR&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;deploy‑staging&lt;/SPAN&gt;&amp;nbsp;– displays Azure&amp;nbsp;CLI output updating the Container&amp;nbsp;App&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;deploy‑prod&lt;/SPAN&gt;&amp;nbsp;– mirrors those steps after manual approval&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This transparency is part of what makes reusable workflows auditable and support enterprise compliance.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Step&amp;nbsp;4&amp;nbsp;–&amp;nbsp;Verify Running&amp;nbsp;Apps&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;After both deployments succeed, confirm each environment is live.&lt;/P&gt;
&lt;H4 data-streamdown="heading-4"&gt;Staging&lt;/H4&gt;
&lt;img /&gt;
&lt;H4 data-streamdown="heading-4"&gt;Production&lt;/H4&gt;
&lt;img /&gt;
&lt;P&gt;Expected response:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;(The exact commit SHA replaces&amp;nbsp;"abc1234".)&lt;/P&gt;
&lt;P&gt;This proves:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;The same container image was promoted unchanged.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Both environments are consistent.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;The platform’s reusable workflows handled the full delivery flow.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;&lt;SPAN data-streamdown="strong"&gt;The Bridge: Why AI Changes Everything&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;Your CI/CD platform now runs like a product: build once, test once, deploy anywhere.&lt;BR /&gt;But software itself is shifting.&lt;BR /&gt;The next generation of systems doesn’t just &lt;EM&gt;serve requests&lt;/EM&gt;&amp;nbsp;— it &lt;EM&gt;reasons&lt;/EM&gt;.&lt;/P&gt;
&lt;P&gt;We are no longer only shipping code.&lt;BR /&gt;We are shipping &lt;SPAN data-streamdown="strong"&gt;AI agents&lt;/SPAN&gt; that evolve, learn, and behave based on prompts, data, and context.&lt;/P&gt;
&lt;P&gt;And that introduces a new set of engineering realities.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;The Old Contract&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Traditional CI/CD pipelines assume:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Code is deterministic&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Tests define correctness&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Deployments promote immutable artifacts&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Those assumptions hold for APIs and microservices.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;The New Reality with AI Systems&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;AI systems violate the core idea of “deterministic correctness.”&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Characteristic&lt;/th&gt;&lt;th&gt;Traditional Software&lt;/th&gt;&lt;th&gt;AI / Agent Systems&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Behavior&lt;/td&gt;&lt;td&gt;Deterministic&lt;/td&gt;&lt;td&gt;Probabilistic&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Definition of success&lt;/td&gt;&lt;td&gt;Binary pass/fail&lt;/td&gt;&lt;td&gt;Continuous score&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Changes&lt;/td&gt;&lt;td&gt;Source code edits&lt;/td&gt;&lt;td&gt;Prompt/model/data changes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Validation method&lt;/td&gt;&lt;td&gt;Unit tests&lt;/td&gt;&lt;td&gt;Semantic evaluation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Risks&lt;/td&gt;&lt;td&gt;Bugs&lt;/td&gt;&lt;td&gt;Hallucination / drift / bias&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Prompts, fine‑tuned models, retraining data, and external tool integrations become active code paths — yet they can’t be meaningfully validated with unit tests alone.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Why This Breaks Standard CI/CD&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Your current CI/CD system answers only one question:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;“Did the code pass its tests?”&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;But for an AI agent, that’s not enough.&lt;BR /&gt;You also need to know:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;“Did the model behave acceptably across metrics that matter?”&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Without that gate, an AI update that produces worse responses could still deploy perfectly — because the pipeline has no concept of semantic quality.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;The Missing Layer&amp;nbsp;— Evaluation&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;What testing is to code, &lt;SPAN data-streamdown="strong"&gt;evaluation&lt;/SPAN&gt; is to AI.&lt;BR /&gt;It separates experimental prompts from production‑ready agents.&lt;/P&gt;
&lt;P&gt;This leads to the next maturity step:&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Extend your CI/CD platform into an AI Delivery Platform&lt;/SPAN&gt; — one that can evaluate, score, and gate agent behavior before deployment.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;What Changes Technically&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;You don’t replace the CI/CD you built.&lt;BR /&gt;You &lt;SPAN data-streamdown="strong"&gt;add a new reusable workflow&lt;/SPAN&gt; to the same platform:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;This new workflow introduces a stage that:&lt;/P&gt;
&lt;OL data-streamdown="ordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Runs offline or dataset‑based evaluation scripts&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Computes a confidence / quality score&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Blocks deployment if performance falls below threshold&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;What This Means Philosophically&lt;/SPAN&gt;&lt;/H3&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Build pipelines&lt;/SPAN&gt; become &lt;SPAN data-streamdown="strong"&gt;governance systems&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Platform teams now own &lt;EM&gt;evaluation&lt;/EM&gt; as much as &lt;EM&gt;deployment&lt;/EM&gt;&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Reusable workflows become policies for AI reliability&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The same architecture — reusable calls, versioned workflows, staged promotions — continues serving you, but with a new function: safeguarding machine behavior.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;&lt;SPAN data-streamdown="strong"&gt;Evaluation as a Gate&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;Your reusable CI/CD system already enforces two things:&lt;/P&gt;
&lt;OL data-streamdown="ordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Code quality&lt;/SPAN&gt; → through tests&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Deployment consistency&lt;/SPAN&gt; → through shared workflows&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The next maturity layer is enforcing &lt;SPAN data-streamdown="strong"&gt;behavioral quality&lt;/SPAN&gt; — ensuring an AI agent performs to a defined standard &lt;EM&gt;before&lt;/EM&gt; it goes live.&lt;/P&gt;
&lt;P&gt;That’s where &lt;SPAN data-streamdown="strong"&gt;evaluation pipelines&lt;/SPAN&gt; come in.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;The Big Shift&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;In conventional systems:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;For AI systems:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Instead of pass/fail assertions, you now gate deployments on &lt;EM&gt;scores&lt;/EM&gt; — accuracy, relevance, factuality, safety, or any quantitative prompt‑response metric.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Reusable Workflow — evaluate-agent.yml&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Add this new file to your platform repository:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;File content:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Example Evaluation Script — eval.py&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;This script executes semantic evaluation logic for your agent.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;As a proof‑of‑concept, this produces a random score.&lt;BR /&gt;In real use, this could compute accuracy against a dataset, compare responses to a gold standard, or call an LLM‑based judge service.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Integrating the New Stage&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;In your AI app repo (for example, agent-app or fastapi-app once it evolves into an agent):&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;This creates a simple but powerful control flow:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;If eval.py writes a score below 0.8, the pipeline stops immediately — deployment blocked, logs recorded, everything traceable.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Key Takeaways&lt;/SPAN&gt;&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Concept&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Reusable&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Same evaluate-agent workflow can gate hundreds of models&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Configurable&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Each use can override thresholds or metrics&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Auditable&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Evaluation scores logged as build artifacts&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Safe&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Prevents low-performing or biased agents from promotion&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Beyond Thresholds&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Later, you can evolve this into:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Adaptive thresholds&lt;/SPAN&gt;&amp;nbsp;per metric&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Human‑in‑the‑loop approvals&lt;/SPAN&gt; for borderline scores&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Trend tracking&lt;/SPAN&gt; – scores over time via GitHub Checks or dashboards&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Integration with observability platforms&lt;/SPAN&gt; (Azure App Insights, Foundry evaluations, etc.)&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-streamdown="heading-2"&gt;&lt;SPAN data-streamdown="strong"&gt;AI Delivery Pipeline + Foundry Integration&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;So far, you have:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;A unified CI/CD platform powered by reusable GitHub Actions&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Evaluation pipelines that gate AI deployments&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Now we expand that architecture into a complete &lt;SPAN data-streamdown="strong"&gt;AI Delivery Platform&lt;/SPAN&gt; by integrating with &lt;SPAN data-streamdown="strong"&gt;Microsoft Foundry&lt;/SPAN&gt;.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;The Goal&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Combine:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;GitHub Actions ↔ Foundry&lt;/SPAN&gt; for seamless build‑evaluate‑deploy cycles&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Reusable workflows&lt;/SPAN&gt; for policies + governance&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Foundry runtime&lt;/SPAN&gt; for execution, scaling, and observability of agents&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This transforms your CI/CD system into a &lt;EM&gt;behavior‑driven deployment layer&lt;/EM&gt; for AI.&lt;/P&gt;
&lt;P&gt;Conceptual Flow&lt;/P&gt;
&lt;img /&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Reusable&amp;nbsp;CI/CD Workflows + Foundry&amp;nbsp;Runtime&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Your existing ci-platform repo now gains a fourth reusable workflow:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Each of these maps to a Foundry capability:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Workflow&lt;/th&gt;&lt;th&gt;Foundry&amp;nbsp;Capability&lt;/th&gt;&lt;th&gt;Role&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;build.yml&lt;/td&gt;&lt;td&gt;Model&amp;nbsp;packaging&amp;nbsp;&amp;amp; versioning&lt;/td&gt;&lt;td&gt;Creates deployable image&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;evaluate-agent.yml&lt;/td&gt;&lt;td&gt;Evaluation&amp;nbsp;service&lt;/td&gt;&lt;td&gt;Runs offline or dataset‑based checks&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;deploy.yml&lt;/td&gt;&lt;td&gt;Agent deployment&lt;/td&gt;&lt;td&gt;Publishes agent to Foundry runtime&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;(&lt;EM&gt;Additional&lt;/EM&gt;) monitor.yml&lt;/td&gt;&lt;td&gt;Telemetry&lt;/td&gt;&lt;td&gt;Pulls evaluation metrics post‑deploy&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Example Foundry‑Aware&amp;nbsp;Pipeline&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;In an AI repository (e.g., agent-app):&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;This sequence guarantees that only successfully evaluated agent versions are deployed to Foundry.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;How Foundry Fits In&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Microsoft Foundry&lt;/SPAN&gt; provides:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Agent runtime&lt;/SPAN&gt; — scalable, managed environment for composable agents&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Evaluation tools&lt;/SPAN&gt; — integrate LLM‑as‑judge, dataset scoring, or automatic benchmarks&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Observability layers&lt;/SPAN&gt; — performance metrics, feedback loops, and telemetry&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Orchestration frameworks&lt;/SPAN&gt; — connect multiple tools or sub‑agents into an ecosystem&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;GitHub Actions handles &lt;EM&gt;delivery logic.&lt;/EM&gt;&lt;BR /&gt;Foundry handles &lt;EM&gt;AI execution and lifecycle.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Together, they form a modular operations stack for AI systems.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Benefits of Integration&lt;/SPAN&gt;&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Benefit&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Governed Deployments&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Only evaluated and approved agent versions reach Foundry&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Traceability&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Every deployed agent is linked to a Git commit and eval score&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Reproducibility&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Re‑running pipeline with the same commit reproduces identical behavior&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Observability&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Foundry telemetry pushes real‑world feedback back into the platform repo&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Architecture View&lt;/P&gt;
&lt;img /&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Governance in Practice&lt;/SPAN&gt;&lt;/H3&gt;
&lt;OL data-streamdown="ordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Every deployment is evaluated&lt;/SPAN&gt;&amp;nbsp;before release.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Every evaluation is logged&lt;/SPAN&gt;&amp;nbsp;as metadata in the Actions run.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Foundry stores live metrics&lt;/SPAN&gt;&amp;nbsp;that can trigger automated re‑evaluation workflows downstream.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;This unifies the DevOps and MLOps worlds under one pipeline.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;&lt;SPAN data-streamdown="strong"&gt;Advanced Practices&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;Integrating evaluation and Foundry is the foundation. True enterprise reliability comes from how you &lt;SPAN data-streamdown="strong"&gt;operate and evolve&lt;/SPAN&gt; those pipelines over time. Below are the main practices that transform this setup from “it works” to “it scales safely.”&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;1. Prompt Versioning&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;In AI systems, &lt;EM&gt;prompts are code.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;A single word change in a prompt can shift an agent’s behavior as much as a logic rewrite does in software. Treat them accordingly:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Store prompts and configurations in git (/prompts/prompt_v1.txt, prompt_v2.txt).&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Use clear change history — commits = versions.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Reference prompt versions explicitly in deployment metadata:&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;UL&gt;
&lt;LI&gt;Re-runs of an old version must reproduce identical responses; versioned prompts make that possible.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;2. Experiment Tracking&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Track every experiment like you track every deployment.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;th&gt;Example Format&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Commit SHA&lt;/td&gt;&lt;td&gt;f9a3c2a&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Prompt version&lt;/td&gt;&lt;td&gt;prompt_v3&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Model checkpoint&lt;/td&gt;&lt;td&gt;gpt‑35‑turbo 2024‑06‑01&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Dataset revision&lt;/td&gt;&lt;td&gt;dataset_v2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Evaluation score&lt;/td&gt;&lt;td&gt;0.87&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Implementation tips:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Write a short artifact file (experiment.json) in each pipeline run.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Store it as a workflow artifact or upload it to an experiment tracker (MLflow, Azure ML Experiments, Foundry History).&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;You can later analyze how prompt or model changes affect score trends.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This allows data‑driven improvement cycles: &lt;EM&gt;evaluate → compare → promote → monitor.&lt;/EM&gt;&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;3. Rollback Strategies&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;For deterministic software:&lt;/P&gt;
&lt;P&gt;Rollback = redeploy previous container.&lt;/P&gt;
&lt;P&gt;For AI systems you may need to rollback &lt;SPAN data-streamdown="strong"&gt;three dimensions&lt;/SPAN&gt;:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Dimension&lt;/th&gt;&lt;th&gt;Example Rollback&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Code&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Checkout previous commit&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Prompt&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Revert to earlier prompt file&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;SPAN data-streamdown="strong"&gt;Model&lt;/SPAN&gt;&lt;/td&gt;&lt;td&gt;Reuse prior checkpoint or model ID&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Best practice:&lt;/SPAN&gt; treat each version triple (code, prompt, model) as one immutable &lt;EM&gt;release unit&lt;/EM&gt; in the pipeline.&lt;BR /&gt;GitHub tags + evaluation artifacts = auditable rollback point.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;4. Continuous Evaluation&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Evaluation shouldn’t stop at deployment.&lt;BR /&gt;Integrate post‑deployment monitoring jobs to detect drift:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Benefits:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Detects silent performance drops caused by new data or model API changes.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Keeps models aligned with their initial standards.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Creates long‑term confidence for compliance audits.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;5. Fail Fast, Fail Safe&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Configure pipelines such that &lt;EM&gt;failure to evaluate = failure to deploy.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;When in doubt, err on protection.&lt;BR /&gt;Failures should be logged, retriable, and transparent — never silent.&lt;/P&gt;
&lt;P&gt;This approach builds institutional trust in AI releases the same way software regression testing built trust in traditional CI/CD.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;6. Governance by Design&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Use GitHub’s native features (branch protections, required reviews, environment rules) as declarative governance.&lt;BR /&gt;Combine them with Foundry’s policy hooks:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;restrict which teams can promote evaluated agents;&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;enforce minimum score thresholds;&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;auto‑disable underperforming models.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Governance embedded in code scales better than manual review boards.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;7. Platform Observability&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Push run data into dashboards. Correlate:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;GitHub Actions runs&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Evaluation scores&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Production telemetry from Foundry&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Visualization options: Azure Monitor, Power BI, Grafana.&lt;BR /&gt;Aim for a &lt;SPAN data-streamdown="strong"&gt;CI/CD + AI Ops Console&lt;/SPAN&gt; view — one pane to observe quality, reliability, and speed.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Outcome of These Practices&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Your organization achieves:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Consistency&lt;/SPAN&gt; across microservices and AI systems&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Accountability&lt;/SPAN&gt; through versioned artifacts&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Safety&lt;/SPAN&gt; via evaluation gates and drift monitors&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Agility&lt;/SPAN&gt; because updates remain fast, but protected&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-streamdown="heading-2"&gt;&lt;SPAN data-streamdown="strong"&gt;Enterprise Scenarios&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;By this point, you’ve built an end‑to‑end platform:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;standardized CI/CD for apps and agents,&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;reusable GitHub Actions workflows,&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Azure runtime for reliable deployments,&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Foundry‑integrated evaluation gates.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Now let’s see how this architecture performs in the wild.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Scenario 1 — Fifty Microservices, One Consistent Pipeline&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Problem Statement&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;At scale, each microservice team usually maintains a slightly different workflow — fragmented test tools, drift in Python or Node versions, duplicated YAML.&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;What Goes Wrong&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Compliance updates require 50 PRs.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Each team solves build problems differently.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Security teams can’t easily prove consistency.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Platform Solution&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;The&amp;nbsp;ci-platform&amp;nbsp;repo defines all workflows once (test‑python.yml, build.yml, deploy.yml).&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Every service just calls them through&amp;nbsp;uses:.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Upgrading the base image or CI version happens once and propagates to all services.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Result&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Full organization upgrade from Python 3.10→3.11 in minutes.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Consistent quality gates, policies, and artifact naming.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Reduced cycle time, increased deployment confidence.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Scenario 2 — Regulated Enterprises (Compliance + Audit)&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Problem Statement&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Financial, healthcare, and government projects require strict controls:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Auditable promotion paths&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Approval workflows&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Traceability of versions and changes&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;What Goes Wrong&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Manual change reviews are error‑prone.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Different CI/CD definitions per team produce inconsistent logs.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Compliance reports take weeks.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Platform Solution&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL data-streamdown="ordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;GitHub Environments provide built‑in approvals and reviewer rules.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;The same reusable workflows ensure identical build signatures.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Foundry integration logs evaluation scores and deployment metadata automatically.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Result&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Reviewers approve through GitHub’s Environment gate — zero custom UI needed.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Each release carries an immutable commit ID + evaluation score + approvers record.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Audit reports generate directly from pipeline history.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Scenario 3 — AI‑Driven Customer Support Platform&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Problem Statement&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;A company running customer support agents (GPT‑powered) wants to continuously improve responses but without risking live quality drops.&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;What Goes Wrong&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Prompt changes can silently worsen accuracy.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Model updates impact intent coverage.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Hard to correlate user feedback with deployment versions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Platform Solution&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Add&amp;nbsp;evaluate-agent.yml into the same CI/CD chain.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Feed evaluation datasets that cover FAQs and tone guidelines.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Require minimum score ≥ 0.85 for promotion.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Deploy via Foundry to production clusters once threshold met.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Stream Foundry telemetry → GitHub → Power BI for quality dashboards.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Result&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Continuous prompt experimentation without sacrificing quality.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Regressed builds automatically blocked.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Business stakeholders track AI accuracy&amp;nbsp;as a live metric.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Bonus Scenario — Enterprise AI R&amp;amp;D Platform&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Multiple research teams train models on‑prem or in Azure ML. The central engineering platform exposes build, evaluate, deploy steps as reusable workflows.&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Data scientists → run “evaluate‑agent” without touching infra.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Platform engineers → control policies, thresholds, approvals.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Leadership → gets consistent reporting on AI performance and cost.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This creates a&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;single standard for AI lifecycle governance&lt;/SPAN&gt;&amp;nbsp;across business units.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Summary&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Your platform now supports:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Area&lt;/th&gt;&lt;th&gt;Traditional Dev&lt;/th&gt;&lt;th&gt;AI Adaptation&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Build &amp;amp; Test&lt;/td&gt;&lt;td&gt; Reusable workflows (Services)&lt;/td&gt;&lt;td&gt; Evaluation gate (Agents)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Deploy&lt;/td&gt;&lt;td&gt; Container Apps / GitHub Environments&lt;/td&gt;&lt;td&gt; Foundry + Telemetry Feedback&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Governance&lt;/td&gt;&lt;td&gt; Environment approval rules&lt;/td&gt;&lt;td&gt; Evaluation threshold + human review&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Scaling&lt;/td&gt;&lt;td&gt; One repo per service&lt;/td&gt;&lt;td&gt; One platform per organization&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Across these cases, the core pattern holds:&lt;BR /&gt;&lt;SPAN data-streamdown="strong"&gt;Centralize workflow logic, decentralize application logic, unify governance.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;&lt;SPAN data-streamdown="strong"&gt;14 — Conclusion&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;What began as a simple effort to clean up a few duplicated YAML files evolved into a complete&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;delivery platform architecture&lt;/SPAN&gt;&amp;nbsp;— one that treats pipelines as first‑class products and extends their usefulness into the era of AI‑driven systems.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;From Pipelines to Platforms&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;At first, you built&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;reusable workflows&lt;/SPAN&gt;&amp;nbsp;in a shared repository.&lt;BR /&gt;That small structural change produced an outsized effect:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Reduced maintenance and drift&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Consistent security and compliance&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;One‑click upgrades across every service&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;You proved that&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;pipeline logic belongs in its own product&lt;/SPAN&gt;&amp;nbsp;— a CI/CD platform.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;From Deterministic to Intelligent Delivery&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Then the domain changed. Deterministic services gave way to AI agents. &lt;BR /&gt;You responded by extending the same reusable platform into the AI dimension:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Added evaluate-agent.yml&amp;nbsp;for semantic scoring&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Introduced Foundry as the runtime for intelligent components&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Unified evaluation, governance, and deployment under the same contracts&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The underlying philosophy remained identical:&amp;nbsp;&lt;EM&gt;don’t duplicate delivery logic — standardize it.&lt;/EM&gt;&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;The Broader Pattern&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;This architecture expresses a clear maturity pathway:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Stage&lt;/th&gt;&lt;th&gt;What Changes&lt;/th&gt;&lt;th&gt;Technical Lever&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;CI/CD as Automation&lt;/td&gt;&lt;td&gt;Build pipelines per project&lt;/td&gt;&lt;td&gt;YAML and Actions&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CI/CD as Product&lt;/td&gt;&lt;td&gt;Reusable workflows, shared logic&lt;/td&gt;&lt;td&gt;Platform Repo&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CI/CD as Governance&lt;/td&gt;&lt;td&gt;Environments, approvals, tracking&lt;/td&gt;&lt;td&gt;GitHub Environments + Azure&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AI Delivery Platform&lt;/td&gt;&lt;td&gt;Evaluation + behavioral policy&lt;/td&gt;&lt;td&gt;Foundry Integration&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Every step adds structure, traceability, and scale, without sacrificing developer velocity.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Cultural Impact&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Moving to a platform model does more than streamline releases.&lt;BR /&gt;It elevates&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;DevOps&lt;/SPAN&gt;&amp;nbsp;to a&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;product discipline&lt;/SPAN&gt;:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Platform engineers design contracts, not scripts.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Application teams consume delivery APIs, not ad‑hoc builds.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;AI teams get reliable evaluation and rollback mechanisms.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;In short:&amp;nbsp;&lt;EM&gt;velocity meets governance.&lt;/EM&gt;&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;The Next Frontier&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;As this pattern matures, two frontiers are emerging:&lt;/P&gt;
&lt;OL data-streamdown="ordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Autonomous Evaluation&lt;/SPAN&gt; — Agents that assess other agents in continuous feedback loops.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;&lt;SPAN data-streamdown="strong"&gt;Dynamic Policy Enforcement&lt;/SPAN&gt; — Pipelines that adjust deployment thresholds and configurations in real time based on observed performance.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The foundations you’ve built — centralized workflows, evaluation gates, and Foundry integration — already support that trajectory.&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;CI/CD maturity is not about writing workflows; it’s about designing reusable systems of workflows.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;What you’ve built is more than CI/CD. It’s a platform that defines how modern software and AI move from idea to production safely.&lt;/P&gt;
&lt;P&gt;Next we’ll close the series with a brief&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;“What’s Next”&lt;/SPAN&gt;&amp;nbsp;section — outlining concrete next steps for building upon this foundation. Continue?&lt;/P&gt;
&lt;P&gt;go ahead&lt;/P&gt;
&lt;P&gt;Here’s the final piece —&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;Section 15 — What’s Next&lt;/SPAN&gt;, which closes the series with momentum and a clear roadmap for readers who want to replicate or extend what you’ve built.&lt;/P&gt;
&lt;H2 data-streamdown="heading-2"&gt;&lt;SPAN data-streamdown="strong"&gt;15 — What’s Next&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;You’ve gone from&amp;nbsp;&lt;EM&gt;writing pipelines&lt;/EM&gt;&amp;nbsp;to&amp;nbsp;&lt;EM&gt;designing platforms.&lt;/EM&gt;&lt;BR /&gt;The CI/CD model you created now governs the lifecycle of both microservices and AI agents — and it’s only the beginning.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Step 1 — Publish Your Platform&lt;/SPAN&gt;&lt;/H3&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Make both repositories public (read‑only) so others can learn from the pattern:
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;ns-github-design/ci-platform – your reusable workflow product&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;ns-github-design/fastapi-app – your minimal consumer example&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Tag the current stable version as v1.0 in both repos.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Add concise READMEs explaining purpose, usage, and version policy.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This turns your repos into&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;live documentation&lt;/SPAN&gt;&amp;nbsp;— a working reference architecture.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Step 2 — Add Automated Docs and Visuals&lt;/SPAN&gt;&lt;/H3&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Export your Draw.io architecture to SVG and embed it in each README.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Use GitHub Pages or Docsify to render a small site explaining:
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;platform repo overview;&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;how workflow_call works;&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;how to set up Azure auth;&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;example runs and outputs.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Readers love code + architecture in one place.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Step 3 — Extend to AI Agents&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Add a third demo:&lt;BR /&gt;agent-evaluator — a lightweight agent that runs eval.py and demonstrates the evaluation gate.&lt;/P&gt;
&lt;P&gt;In that repo:&lt;/P&gt;
&lt;OL data-streamdown="ordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Call&amp;nbsp;evaluate-agent.yml&amp;nbsp;from your platform.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Push commits that sometimes fail thresholds.&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Show screenshots of blocked vs. approved runs.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;You’ll have a fully working&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;AI evaluation demo&lt;/SPAN&gt;&amp;nbsp;powered by your platform.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Step 4 — Instrument Foundry Feedback&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;Use Foundry’s APIs to stream live evaluation results or observability data back into GitHub Actions artifacts:&lt;/P&gt;
&lt;P data-language="yaml" data-streamdown="code-block-header"&gt;yaml&lt;/P&gt;
&lt;P data-language="yaml" data-streamdown="code-block"&gt;- name: Collect Foundry feedback run: foundry metrics export --project my-ai-agent --output metrics.json&lt;/P&gt;
&lt;P&gt;That feedback loop will let you build dashboards of quality trends alongside deployment timeline.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Step 5 — Prepare Part 3 (Next Blog)&lt;/SPAN&gt;&lt;/H3&gt;
&lt;P&gt;You now have a natural foundation for the next article:&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;“Autonomous Delivery Loops: Continuous Evaluation and Guardrails for AI Agents.”&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Outline:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;Continuous evaluation with scheduled runs&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Self‑healing approval flows&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Dynamic policy adjustment based on metrics&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;Cross‑team Governance as Code&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;That installment makes your series visionary and future‑ready.&lt;/P&gt;
&lt;H3 data-streamdown="heading-3"&gt;&lt;SPAN data-streamdown="strong"&gt;Quick Recap&lt;/SPAN&gt;&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Phase&lt;/th&gt;&lt;th&gt;Achievement&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;1 – 4&lt;/td&gt;&lt;td&gt;Built CI/CD Platform + App Repo&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;Configured Azure + OIDC&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;Verified Pipeline End‑to‑End&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8 – 15&lt;/td&gt;&lt;td&gt;Documented Demo → AI Integration → Enterprise Practices → Vision&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;You now have a&amp;nbsp;&lt;SPAN data-streamdown="strong"&gt;complete blog series&lt;/SPAN&gt;&amp;nbsp;that is:&lt;/P&gt;
&lt;UL data-streamdown="unordered-list"&gt;
&lt;LI data-streamdown="list-item"&gt;technically deep,&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;architecturally unique,&lt;/LI&gt;
&lt;LI data-streamdown="list-item"&gt;demonstrably real.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Every diagram, YAML, and code sample came from a working, reproducible system — the hallmark of strong engineering writing.&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-streamdown="strong"&gt;Final Thought&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Software delivery used to end at deployment.&lt;BR /&gt;AI delivery begins there.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;The future of platforms is not just to ship software faster — but to ensure that every agent behaves as designed.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Mar 2026 10:48:39 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/ci-cd-as-a-platform-shipping-microservices-and-ai-agents-with/ba-p/4504550</guid>
      <dc:creator>nasreensarah</dc:creator>
      <dc:date>2026-03-30T10:48:39Z</dc:date>
    </item>
    <item>
      <title>From Toil to Trust: How Azure SRE Agent Is Redefining Cloud Operations</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/from-toil-to-trust-how-azure-sre-agent-is-redefining-cloud/ba-p/4505875</link>
      <description>&lt;P&gt;As Azure environments scale, infrastructure teams face a familiar challenge:&amp;nbsp;&lt;STRONG&gt;operating reliably at scale&lt;/STRONG&gt;.&lt;BR /&gt;Outages are no longer caused by a single VM or misconfigured service—they emerge from complex dependencies across compute, networking, storage, and platform services.&lt;/P&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/" target="_blank" rel="noopener"&gt;Azure SRE Agent&lt;/A&gt; is designed to help infrastructure and SRE teams meet this challenge by bringing &lt;STRONG&gt;AI‑assisted diagnostics and remediation directly into Azure operations&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;This post focuses on:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Infrastructure‑centric scenarios where Azure SRE Agent is most useful&lt;/LI&gt;
&lt;LI&gt;How infra teams can access it from the Azure portal&lt;/LI&gt;
&lt;LI&gt;Prerequisites required before onboarding&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;What Is Azure SRE Agent (From an Infrastructure Lens)&lt;/H2&gt;
&lt;P&gt;Azure SRE Agent is an &lt;STRONG&gt;AI‑powered reliability assistant&lt;/STRONG&gt; integrated into Azure.&lt;BR /&gt;It continuously observes telemetry from Azure Monitor, Log Analytics, and service APIs to help engineers &lt;STRONG&gt;diagnose, investigate, and remediate production issues&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;From an infrastructure standpoint, the agent understands:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure resource topology and dependencies&lt;/LI&gt;
&lt;LI&gt;Common failure patterns across Azure services&lt;/LI&gt;
&lt;LI&gt;Safe operational actions using Azure CLI and REST APIs&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;It can either &lt;STRONG&gt;recommend actions&lt;/STRONG&gt; or &lt;STRONG&gt;execute remediation steps&lt;/STRONG&gt; with appropriate guardrails and approvals.&lt;/P&gt;
&lt;P&gt;The agent operates through multiple automation mechanisms, including:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Built-in Azure knowledge&lt;/STRONG&gt;: Preconfigured understanding of Azure services with optimized operational patterns&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Custom runbooks&lt;/STRONG&gt;: Execute Azure CLI commands, and REST API calls for any Azure service&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Subagent extensibility&lt;/STRONG&gt;: Build specialized agents for specific services like VMs, databases, or networking components&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;External integrations&lt;/STRONG&gt;: Connect to monitoring, incident management, and source control systems&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Infrastructure Scenarios Where Azure SRE Agent Helps the Most&lt;/H2&gt;
&lt;H3&gt;1. Incident Investigation &amp;amp; Production Outages&lt;/H3&gt;
&lt;P&gt;During an incident, infra engineers typically pivot between alerts, metrics, logs, and dashboards. Azure SRE Agent simplifies this by correlating telemetry automatically and summarizing the issue in natural language.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Typical infrastructure issues:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Virtual machines becoming unresponsive&lt;/LI&gt;
&lt;LI&gt;App Service or Container App failures&lt;/LI&gt;
&lt;LI&gt;Network connectivity or NSG misconfigurations&lt;/LI&gt;
&lt;LI&gt;Storage throttling or capacity exhaustion&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Instead of manually querying logs, engineers can ask the agent &lt;EM&gt;why&lt;/EM&gt; something failed and get a reasoned response.&lt;/P&gt;
&lt;H3&gt;2. Log and Metric Driven Root Cause Analysis&lt;/H3&gt;
&lt;P&gt;Azure SRE Agent consumes Azure Monitor and Log Analytics data directly.&lt;BR /&gt;This allows it to perform &lt;STRONG&gt;context‑aware RCA&lt;/STRONG&gt; without engineers needing to manually write KQL for common scenarios.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Example question:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;“Why did my App Service start returning HTTP 500 errors after the last deployment?”&lt;/P&gt;
&lt;P&gt;The agent correlates deployment activity, configuration changes, and telemetry to identify the most likely root cause.&lt;/P&gt;
&lt;H3&gt;3. Safe, Controlled Remediation for Infrastructure Issues&lt;/H3&gt;
&lt;P&gt;A key value for infra teams is &lt;STRONG&gt;controlled automation&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Azure SRE Agent supports:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Review mode&lt;/STRONG&gt; – actions are proposed and require explicit approval&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Autonomous mode&lt;/STRONG&gt; – pre‑approved sub‑agents execute actions automatically&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This is useful for repeatable infra tasks such as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Restarting unhealthy services&lt;/LI&gt;
&lt;LI&gt;Scaling compute resources&lt;/LI&gt;
&lt;LI&gt;Rolling back failed deployments&lt;/LI&gt;
&lt;LI&gt;Correcting known configuration drift&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Automation is applied &lt;STRONG&gt;with guardrails&lt;/STRONG&gt;, not blindly.&lt;/P&gt;
&lt;H3&gt;4. Infrastructure Guardrails &amp;amp; Operational Hygiene&lt;/H3&gt;
&lt;P&gt;Beyond incidents, Azure SRE Agent can continuously evaluate infrastructure posture and highlight operational risks.&lt;/P&gt;
&lt;P&gt;Examples include:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Detecting insecure network exposure&lt;/LI&gt;
&lt;LI&gt;Flagging unsupported SKUs or configurations&lt;/LI&gt;
&lt;LI&gt;Identifying operational anti‑patterns&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This helps infra teams move from reactive firefighting to &lt;STRONG&gt;proactive reliability management&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;5. Extending Infrastructure Automation with Subagents&lt;/H3&gt;
&lt;P&gt;Infrastructure teams can extend Azure SRE Agent using &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/sub-agents" target="_blank" rel="noopener"&gt;subagents&lt;/A&gt; tailored&amp;nbsp;to specific domains such as networking, databases, or storage.&lt;/P&gt;
&lt;P&gt;Using the Subagent Builder, teams can:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Attach custom runbooks&lt;/LI&gt;
&lt;LI&gt;Integrate external observability tools&lt;/LI&gt;
&lt;LI&gt;Trigger actions on alerts or schedules&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This enables gradual adoption—from advisory assistance to advanced automation.&lt;/P&gt;
&lt;H2&gt;How to Get Started with Azure SRE Agent&lt;/H2&gt;
&lt;P&gt;The following sections outline the prerequisites, connectivity considerations, and supported integrations required to onboard Azure SRE Agent in an enterprise Azure environment.&lt;/P&gt;
&lt;H3&gt;Prerequisites and Ownership Model&lt;/H3&gt;
&lt;P&gt;Azure SRE Agent introduces platform‑level prerequisites that span infrastructure, platform, security, and network teams. While infrastructure teams are the primary users, successful onboarding requires cross‑team alignment.&lt;/P&gt;
&lt;P&gt;The sections below explicitly tag ownership for clarity.&lt;/P&gt;
&lt;H4&gt;1. Subscription &amp;amp; Region&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Owner:&lt;/STRONG&gt; &lt;EM&gt;Platform / Subscription Admin&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Dedicated Azure subscription or resource group recommended for evaluation or PoC&lt;/LI&gt;
&lt;LI&gt;During preview, the &lt;STRONG&gt;agent control plane must be created in an available location (Sweden Central, Australia East, US East 2)&lt;/STRONG&gt;, while monitored workloads can reside in any Azure region&lt;/LI&gt;
&lt;LI&gt;Subscription may need to be &lt;STRONG&gt;allow‑listed for access&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Infra teams typically request this setup; platform teams implement it.&lt;/P&gt;
&lt;H4&gt;2. Identity &amp;amp; Access (Critical)&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Owner:&lt;/STRONG&gt; &lt;EM&gt;Platform + Security&lt;/EM&gt;&lt;BR /&gt;&lt;STRONG&gt;Consumer:&lt;/STRONG&gt; &lt;EM&gt;Infra / SRE&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Ability to create &lt;STRONG&gt;managed identities&lt;/STRONG&gt; (system‑assigned or user‑assigned depending on scenario)&lt;/LI&gt;
&lt;LI&gt;Elevated RBAC permissions required during onboarding:
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;&lt;EM&gt;Microsoft.Authorization/roleAssignments/write&lt;/EM&gt; at subscription scope&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;Roles such as &lt;STRONG&gt;Owner&lt;/STRONG&gt;, &lt;STRONG&gt;User Access Administrator&lt;/STRONG&gt;, or &lt;STRONG&gt;RBAC Administrator&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Post‑onboarding, the SRE Agent identity should be granted &lt;STRONG&gt;least‑privilege RBAC&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Read‑only for investigation scenarios&lt;/LI&gt;
&lt;LI&gt;Scoped write access only where remediation is approved&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Infra teams use the identity; security and platform teams govern access.&lt;/P&gt;
&lt;H4&gt;3. Resource Provider Registration&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Owner:&lt;/STRONG&gt; &lt;EM&gt;Platform&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Required Azure resource providers must be registered in the subscription&lt;/LI&gt;
&lt;LI&gt;Includes providers used by the agent runtime and dependent services&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Typically, a one‑time platform task.&lt;/P&gt;
&lt;H4&gt;4. Monitoring &amp;amp; Telemetry Baseline (Hard Dependency)&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Owner:&lt;/STRONG&gt; &lt;EM&gt;Infra / Platform (Shared)&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Monitor enabled for target workloads&lt;/LI&gt;
&lt;LI&gt;Diagnostic settings configured to send logs and metrics to:
&lt;UL&gt;
&lt;LI&gt;Log Analytics&lt;/LI&gt;
&lt;LI&gt;Application Insights (where applicable)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;During agent creation, supporting resources may be deployed:
&lt;UL&gt;
&lt;LI&gt;Log Analytics workspace&lt;/LI&gt;
&lt;LI&gt;Application Insights&lt;/LI&gt;
&lt;LI&gt;Smart detector alert rules&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Infra teams depend on this telemetry; platform teams often provide the shared foundation.&lt;/P&gt;
&lt;H4&gt;5. Network &amp;amp; Connectivity&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Owner:&lt;/STRONG&gt; &lt;EM&gt;Network / Security&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Outbound HTTPS connectivity required to:
&lt;UL&gt;
&lt;LI&gt;Azure management endpoints (ARM, Monitor, etc.)&lt;/LI&gt;
&lt;LI&gt;External systems when integrations are enabled (for example, ServiceNow or MCP servers)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Custom MCP endpoints must be &lt;STRONG&gt;remote and HTTPS‑reachable&lt;/STRONG&gt; (local endpoints not supported)&lt;/LI&gt;
&lt;LI&gt;IP allow‑listing scenarios must be explicitly validated; static egress IPs are not guaranteed&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Required only if the organization enforces strict &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/network-requirements" target="_blank" rel="noopener"&gt;network controls&lt;/A&gt;.&lt;/P&gt;
&lt;H4&gt;6. Connector‑Specific Prerequisites (Conditional)&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Owner:&lt;/STRONG&gt; &lt;EM&gt;Security + Platform&lt;/EM&gt;&lt;BR /&gt;&lt;STRONG&gt;Consumer:&lt;/STRONG&gt; &lt;EM&gt;Infra / SRE&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Microsoft Teams / Outlook connectors&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;OAuth consent for Microsoft 365 APIs&lt;/LI&gt;
&lt;LI&gt;User‑assigned managed identity required for connector authentication&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Custom MCP connectors&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;MCP base URL&lt;/LI&gt;
&lt;LI&gt;Authentication material (API key, token, or OAuth)&lt;/LI&gt;
&lt;LI&gt;RBAC permissions to configure and manage connectors&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Applies only when integrations are enabled.&lt;/P&gt;
&lt;H4&gt;7. Automation Readiness&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Owner:&lt;/STRONG&gt; &lt;EM&gt;Infra / SRE + Security&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Clear decision on &lt;STRONG&gt;recommendation‑only vs automated remediation&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Defined approval model:
&lt;UL&gt;
&lt;LI&gt;Human‑in‑the‑loop&lt;/LI&gt;
&lt;LI&gt;Scoped autonomy for well‑understood actions&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Agent identity granted write permissions &lt;STRONG&gt;only where automation is explicitly approved&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Infra teams define operational intent; security teams validate boundaries.&lt;/P&gt;
&lt;H4&gt;8. Governance &amp;amp; Data Handling&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;Owner:&lt;/STRONG&gt; &lt;EM&gt;Security / Governance&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Acceptance that prompts, responses, and analysis data are stored in the agent’s region&lt;/LI&gt;
&lt;LI&gt;Alignment with organizational policies for:
&lt;UL&gt;
&lt;LI&gt;Logging and retention&lt;/LI&gt;
&lt;LI&gt;Auditability&lt;/LI&gt;
&lt;LI&gt;Responsible AI usage and approvals&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This is a governance prerequisite, not an infra task.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Azure SRE Agent operates as a managed control plane layered on Azure Resource Manager, Azure Monitor, Log Analytics, and managed identities. As a result,&amp;nbsp;&lt;STRONG&gt;identity, telemetry, and governance foundations must be in place before infra teams can fully benefit from the agent.&lt;/STRONG&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3&gt;Integration It Supports&lt;/H3&gt;
&lt;P&gt;Azure SRE Agent integrates with your operational ecosystem in the following ways:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Monitoring and observability:&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Monitor (metrics, logs, alerts, workbooks)&lt;/LI&gt;
&lt;LI&gt;Application Insights&lt;/LI&gt;
&lt;LI&gt;Log Analytics&lt;/LI&gt;
&lt;LI&gt;Grafana&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Incident management:&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Monitor Alerts&lt;/LI&gt;
&lt;LI&gt;PagerDuty&lt;/LI&gt;
&lt;LI&gt;ServiceNow&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Source control and CI/CD:&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;GitHub (repositories, issues)&lt;/LI&gt;
&lt;LI&gt;Azure DevOps (repos, work items)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Data sources:&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Data Explorer (Kusto) clusters&lt;/LI&gt;
&lt;LI&gt;Model Context Protocol (MCP) servers&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-local-id="3a811bd0492b" data-renderer-start-pos="8143"&gt;Connectivity Matrix&lt;/H3&gt;
&lt;H4 data-local-id="1ab1a66d05df" data-renderer-start-pos="8494"&gt;1. Azure Control Plane Connectivity&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 99.3519%; height: 302px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8534" data-local-id="ed352edb3040"&gt;&lt;STRONG&gt;Source&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8544" data-local-id="557dea682078"&gt;&lt;STRONG&gt;Destination&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8559" data-local-id="683d488b3f41"&gt;&lt;STRONG&gt;Direction&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8572" data-local-id="417c1e9d1484"&gt;&lt;STRONG&gt;Protocol / Port&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8591" data-local-id="03d2cd3a6c96"&gt;&lt;STRONG&gt;Authentication&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8609" data-local-id="07eb63362814"&gt;&lt;STRONG&gt;Purpose&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8622" data-local-id="2dac317e1c32"&gt;SRE Agent service&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8643" data-local-id="e70f60b4a076"&gt;Azure Resource Manager (ARM)&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8675" data-local-id="10f2139a529c"&gt;Outbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8687" data-local-id="e5cd8e2794eb"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8702" data-local-id="dd714b8e1577"&gt;Managed Identity (OAuth 2.0)&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8734" data-local-id="7ced970f1ae0"&gt;Read and (with approval) write operations on Azure resources.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8801" data-local-id="d93f1b416ec7"&gt;SRE Agent service&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8822" data-local-id="cb2ab48cf890"&gt;Azure Monitor&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8839" data-local-id="608471999cc0"&gt;Outbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8851" data-local-id="92ab15f1cbd5"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8866" data-local-id="c8a525b39c33"&gt;Managed Identity&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8886" data-local-id="ec918dd41359"&gt;Alert ingestion and metric queries.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8927" data-local-id="554aa8591962"&gt;SRE Agent service&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8948" data-local-id="2b79e2c4ebbf"&gt;Log Analytics Workspace&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8975" data-local-id="248eeab02da9"&gt;Outbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="8987" data-local-id="c52e6a1cf934"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9002" data-local-id="59a55871fc2d"&gt;Managed Identity&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9022" data-local-id="85fab0c43a3f"&gt;Log queries (KQL) for RCA.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9054" data-local-id="adff0ac0c270"&gt;SRE Agent service&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9075" data-local-id="f3d396167772"&gt;Application Insights&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9099" data-local-id="a4cc5c01a24c"&gt;Outbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9111" data-local-id="e16d71f665be"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9126" data-local-id="e617ac143cc9"&gt;Managed Identity&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9146" data-local-id="7b5dfe0fc42a"&gt;Application telemetry analysis.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 16.7609%" /&gt;&lt;col style="width: 16.7609%" /&gt;&lt;col style="width: 16.7609%" /&gt;&lt;col style="width: 16.7609%" /&gt;&lt;col style="width: 16.7609%" /&gt;&lt;col style="width: 16.4616%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;UL data-local-id="d16dd67e-d509-41f4-9a95-7082b5033e03" data-indent-level="1"&gt;
&lt;LI&gt;All Azure control‑plane communication is &lt;STRONG data-renderer-mark="true"&gt;outbound only&lt;/STRONG&gt; from the agent.&lt;/LI&gt;
&lt;LI&gt;No inbound connectivity to customer VNets is required.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-local-id="322863a36268" data-renderer-start-pos="9318"&gt;2. Incident Management Integrations&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9358" data-local-id="f16c0ea09a23"&gt;&lt;STRONG&gt;Platform&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9370" data-local-id="8ca2bb0344a1"&gt;&lt;STRONG&gt;Connectivity Type&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9391" data-local-id="0e7498c34ee6"&gt;&lt;STRONG&gt;Direction&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9404" data-local-id="678a6415e1c9"&gt;&lt;STRONG&gt;Protocol / Port&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9423" data-local-id="d976893a1f3e"&gt;&lt;STRONG&gt;Auth Mechanism&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9441" data-local-id="866bb8f0a6e1"&gt;&lt;STRONG&gt;Data Exchanged&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9461" data-local-id="106f8566ee17"&gt;Azure Monitor Alerts&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9485" data-local-id="e33762419614"&gt;Native&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9495" data-local-id="455aa01f2d5f"&gt;Inbound → Agent&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9514" data-local-id="8d445f926848"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9529" data-local-id="9980cbf9e6b4"&gt;Azure‑native&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9545" data-local-id="6370bbe517e9"&gt;Alert payloads, severity, metadata&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9585" data-local-id="183f9a06e4bf"&gt;ServiceNow&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9599" data-local-id="8c03202474a5"&gt;Connector (Webhook/API)&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9626" data-local-id="fa8174ee4578"&gt;Outbound &amp;amp; Inbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9648" data-local-id="df84edd4f45b"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9663" data-local-id="a191930e6631"&gt;OAuth / API token&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9684" data-local-id="4fc4ff4a9f3d"&gt;Incident creation, updates, status sync&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9729" data-local-id="2470bfb3ce5a"&gt;PagerDuty&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9742" data-local-id="991af006eca8"&gt;Connector (Webhook/API)&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9769" data-local-id="54854b75ac2b"&gt;Outbound &amp;amp; Inbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9791" data-local-id="c70bbb32ebd4"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9806" data-local-id="42e8f91068a7"&gt;OAuth / API token&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="9827" data-local-id="21748464ef8d"&gt;Incident events, acknowledgements&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;col style="width: 16.67%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;UL data-local-id="4dcd1ab5-d9a8-4c3b-ae92-94e18b2d457e" data-indent-level="1"&gt;
&lt;LI&gt;Third‑party platforms require explicit connector configuration.&lt;/LI&gt;
&lt;LI&gt;Payloads are limited to incident metadata and diagnostics context.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-local-id="d97c0e467761" data-renderer-start-pos="10006"&gt;3. Collaboration &amp;amp; Notification Channels&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10051" data-local-id="03b1152b26ae"&gt;&lt;STRONG&gt;Tool&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10059" data-local-id="2be2bdd2519e"&gt;&lt;STRONG&gt;Direction&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10072" data-local-id="113c20c6f6eb"&gt;&lt;STRONG&gt;Protocol / Port&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10091" data-local-id="511c7f7d385f"&gt;&lt;STRONG&gt;Authentication&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10109" data-local-id="2d73ae797c93"&gt;&lt;STRONG&gt;Typical Usage&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10128" data-local-id="7c7212315334"&gt;Microsoft Teams&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10147" data-local-id="8f4a53755d3e"&gt;Outbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10159" data-local-id="e9e2ecbf65bb"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10174" data-local-id="5c785c97763c"&gt;OAuth (User‑assigned Managed Identity)&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10216" data-local-id="a5acffbf48cd"&gt;Post incident summaries, updates&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10254" data-local-id="9755133b7399"&gt;Outlook (Email)&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10273" data-local-id="db28d2da6443"&gt;Outbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10285" data-local-id="88c2df5f60c1"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10300" data-local-id="96f98185c743"&gt;OAuth (User‑assigned Managed Identity)&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10342" data-local-id="eefcb73b9171"&gt;Incident notifications, reports&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;UL data-local-id="cc5bba67-81d4-44bc-86d3-ed0b8c740c6f" data-indent-level="1"&gt;
&lt;LI&gt;Only &lt;STRONG data-renderer-mark="true"&gt;user‑assigned managed identities&lt;/STRONG&gt; are supported for Office 365 connectors.&lt;/LI&gt;
&lt;LI&gt;No mailbox‑level permissions are stored in the agent.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-local-id="17cbd4a3648b" data-renderer-start-pos="10521"&gt;4. External &amp;amp; Custom Integrations&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10559" data-local-id="73c37bc7d02d"&gt;&lt;STRONG&gt;Integration Type&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10579" data-local-id="aa46948ba80e"&gt;&lt;STRONG&gt;Direction&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10592" data-local-id="d11823506418"&gt;&lt;STRONG&gt;Protocol / Port&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10611" data-local-id="e38eff6e886c"&gt;&lt;STRONG&gt;Auth&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10619" data-local-id="499efd3b4ff6"&gt;&lt;STRONG&gt;Example Use Cases&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10642" data-local-id="bf7427d2e174"&gt;Custom MCP Servers&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10664" data-local-id="944d5585beda"&gt;Outbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10676" data-local-id="b897c0c0b757"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10691" data-local-id="d72394a5e48c"&gt;OAuth / API key&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10710" data-local-id="2cfee9c71a02"&gt;GitHub issues, Dynatrace, custom observability&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10762" data-local-id="e838caf47b0d"&gt;Python Execution Tool&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10787" data-local-id="34a60406a9f7"&gt;Outbound&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10799" data-local-id="09437a18efe1"&gt;HTTPS / 443&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10814" data-local-id="a6e178a7e02b"&gt;As defined by script&lt;/P&gt;
&lt;/td&gt;&lt;td colspan="1" rowspan="1"&gt;
&lt;P data-renderer-start-pos="10838" data-local-id="a12518884bea"&gt;REST checks, custom diagnostics&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;UL data-local-id="ab4258bc-e678-4521-89ae-87ed094b4d48" data-indent-level="1"&gt;
&lt;LI&gt;Endpoints must be explicitly configured and approved.&lt;/LI&gt;
&lt;LI&gt;Secrets do not persist; credentials are injected securely at runtime.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-local-id="a514ef033f55" data-renderer-start-pos="11008"&gt;5. Firewall &amp;amp; Network Considerations&lt;/H4&gt;
&lt;UL data-local-id="4fa104a1-faff-4702-89f6-8b8a1125f107" data-indent-level="1"&gt;
&lt;LI&gt;Add &lt;EM&gt;*.azuresre.ai &lt;/EM&gt;to your firewall allow list. Some networking profiles might block access to this domain by default.&lt;/LI&gt;
&lt;LI&gt;Allow outbound HTTPS (443) to:
&lt;UL data-local-id="e5236c7d-e0b9-4753-979c-a560db6b52dc" data-indent-level="2"&gt;
&lt;LI&gt;Azure control‑plane endpoints&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;*.azuresre.ai&lt;/EM&gt; (SRE Agent service)&lt;/LI&gt;
&lt;LI&gt;Configured third‑party endpoints (ServiceNow, PagerDuty, MCP servers)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;No inbound firewall rules or private endpoint exposure is required.&lt;/LI&gt;
&lt;LI&gt;Compatible with &lt;STRONG data-renderer-mark="true"&gt;private VNets&lt;/STRONG&gt; and restricted outbound models when allow‑listed.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;How to Access Azure SRE Agent&lt;/H2&gt;
&lt;P&gt;Azure SRE Agent can be accessed directly from the Azure portal or via its own&amp;nbsp;&lt;A href="https://sre.azure.com/" target="_blank" rel="noopener"&gt;SRE portal.&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Adding few screenshots below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;Via Azure Portal&lt;/img&gt;&lt;img&gt;Using it URL&lt;/img&gt;
&lt;H4&gt;Relatable links:&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;Check how to use SRE agent,&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/usage" target="_blank" rel="noopener"&gt;here&lt;/A&gt;.&lt;/LI&gt;
&lt;LI&gt;Understand how to automate workflow, &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/automate-workflows" target="_blank" rel="noopener"&gt;here&lt;/A&gt;.&lt;/LI&gt;
&lt;LI&gt;Perform Azure environment diagnosis using SRE agent, &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/diagnose-azure-observability" target="_blank" rel="noopener"&gt;here&lt;/A&gt;.&lt;/LI&gt;
&lt;LI&gt;Check about connectors used to extend reach to external system, &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/connectors" target="_blank" rel="noopener"&gt;here&lt;/A&gt;.&lt;/LI&gt;
&lt;LI&gt;To setup your first investigation, &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/first-investigation" target="_blank" rel="noopener"&gt;here&lt;/A&gt;.&lt;/LI&gt;
&lt;LI&gt;Learn about its pricing and billing, &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/pricing-billing" target="_blank" rel="noopener"&gt;here&lt;/A&gt;.&lt;/LI&gt;
&lt;LI&gt;Check how to manage permissions for SRE agent,&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/manage-permissions" target="_blank" rel="noopener"&gt;here.&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;To setup the MCP connectors in SRE agent,&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/mcp-connectors" target="_blank" rel="noopener"&gt;here.&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/sre-agent/anthropic-sub-processor" target="_blank" rel="noopener"&gt;Anthropic as a sub-processor in Azure SRE Agent&lt;/A&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2&gt;&lt;SPAN style="color: rgb(30, 30, 30); font-size: 32px;"&gt;Why Azure SRE Agent Matters for Infrastructure Teams&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;For infrastructure and SRE teams managing large Azure estates, Azure SRE Agent provides a &lt;STRONG&gt;single, agentic reliability layer&lt;/STRONG&gt; that unifies observability, incident management, and delivery workflows into a governed, intent‑driven operating model.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Reduced Mean Time to Resolution (MTTR):&lt;/STRONG&gt; By integrating natively with &lt;STRONG&gt;Azure Monitor and Log Analytics&lt;/STRONG&gt;, the agent performs deep, multi‑signal investigation and root‑cause analysis without manual query building or correlation.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Faster investigation without dashboard hopping:&lt;/STRONG&gt; Azure SRE Agent reasons across monitoring, incident, and delivery systems from one interface, eliminating context‑switching across tools.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/deep-investigation" target="_blank" rel="noopener"&gt;Deep investigation&lt;/A&gt; &amp;amp; &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/root-cause-analysis" target="_blank" rel="noopener"&gt;root-cause analysis&lt;/A&gt;:&lt;/STRONG&gt; Performs multi‑signal correlation across logs, metrics, configuration state, and recent changes to identify true root causes rather than surface symptoms, with clear, explainable findings.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Lower operational toil:&lt;/STRONG&gt; Repetitive diagnostics and triage tasks are handled by the agent, allowing engineers to focus on higher‑value reliability and platform improvements.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Consistent and auditable &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/incident-response" target="_blank" rel="noopener"&gt;incident response&lt;/A&gt;:&lt;/STRONG&gt; Through Azure Monitor,&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/setup-servicenow-indexing" target="_blank" rel="noopener"&gt;ServiceNow&lt;/A&gt; and &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/set-up-pagerduty-indexing" target="_blank" rel="noopener"&gt;PagerDuty&lt;/A&gt; integration, investigations are embedded directly into ITSM and on‑call workflows, ensuring traceability, consistency, and governance.&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Scheduled tasks and proactive checks:&lt;/STRONG&gt; Using &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/scheduled-tasks?tabs=health-check" target="_blank" rel="noopener"&gt;scheduled workflows&lt;/A&gt;, teams can run daily or periodic health validations, drift checks, and post‑deployment verifications—shifting operations from reactive firefighting to proactive reliability management.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Extensible sub‑agents and skills:&lt;/STRONG&gt; Infrastructure teams can create&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/skills" target="_blank" rel="noopener"&gt;skills&lt;/A&gt;, &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/sub-agents" target="_blank" rel="noopener"&gt;subagents&lt;/A&gt;, and workflows to encode domain expertise into the agent, making reliability knowledge reusable and consistent.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Delivery and code awareness:&lt;/STRONG&gt; Integration with &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/github-connector" target="_blank" rel="noopener"&gt;GitHub&lt;/A&gt; and&amp;nbsp;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/sre-agent/ado-connector" target="_blank" rel="noopener"&gt;Azure DevOps&lt;/A&gt; allows the agent to correlate infrastructure issues with source code, IaC definitions, pipelines, and work items—enabling actionable follow‑ups such as bug creation, PR recommendations, or release fixes.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Safer, governed automation:&lt;/STRONG&gt; Human‑in‑the‑loop controls ensure all recommendations and actions are auditable, reviewable, and aligned with enterprise governance, enabling progressive automation without compromising safety.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Better reliability at infrastructure scale:&lt;/STRONG&gt; By shifting teams from manual diagnostics to intent‑driven, agent‑assisted operations, Azure SRE Agent helps organizations move from reactive firefighting to systematic, scalable reliability engineering.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It shifts teams from &lt;STRONG&gt;manual diagnostics&lt;/STRONG&gt; to &lt;STRONG&gt;intent‑driven operations&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Closing Thoughts&lt;/H2&gt;
&lt;P&gt;Azure SRE Agent is not just another troubleshooting tool—it represents a shift toward &lt;STRONG&gt;agent‑assisted infrastructure operations&lt;/STRONG&gt;. By embedding AI‑driven reasoning directly into Azure, infrastructure teams can focus less on repetitive investigation and more on building resilient platforms.&lt;/P&gt;</description>
      <pubDate>Sat, 11 Apr 2026 05:36:05 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/from-toil-to-trust-how-azure-sre-agent-is-redefining-cloud/ba-p/4505875</guid>
      <dc:creator>siddhigupta</dc:creator>
      <dc:date>2026-04-11T05:36:05Z</dc:date>
    </item>
  </channel>
</rss>

