<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>Azure Infrastructure Blog articles</title>
    <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/bg-p/AzureInfrastructureBlog</link>
    <description>Azure Infrastructure Blog articles</description>
    <pubDate>Sat, 09 May 2026 18:43:40 GMT</pubDate>
    <dc:creator>AzureInfrastructureBlog</dc:creator>
    <dc:date>2026-05-09T18:43:40Z</dc:date>
    <item>
      <title>CHERIoT-Ibex: Closing the door on memory safety vulnerabilities with hardware-enforced protection</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/cheriot-ibex-closing-the-door-on-memory-safety-vulnerabilities/ba-p/4517904</link>
      <description>&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Memory safety vulnerabilities—largely arising&amp;nbsp;from widely used programming languages such as C and C++—remain&amp;nbsp;a leading cause of exploitable software defects across systems, from embedded devices to&amp;nbsp;cloud-scale&amp;nbsp;infrastructure. In simple terms, memory safety ensures that software accesses only the data it is intended to use; when this protection fails, attackers can exploit these defects to gain control of devices or disrupt critical services. &lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Industry data shows that&amp;nbsp;about 70 percent&amp;nbsp;of the vulnerabilities Microsoft assigns as&amp;nbsp;Common Vulnerabilities and Exposures (CVE)&amp;nbsp;each year are memory safety issues, highlighting how&amp;nbsp;frequently&amp;nbsp;these software defects translate into&amp;nbsp;real-world&amp;nbsp;security risk (&lt;/SPAN&gt;&lt;A href="https://www.cisa.gov/news-events/news/urgent-need-memory-safety-software-products" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;CISA – The Urgent Need for Memory Safety in Software Products&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;). Hardware-enforced&amp;nbsp;protections such&amp;nbsp;as&amp;nbsp;CHERIoT&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;-&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;Ibex&amp;nbsp;can help&amp;nbsp;eliminate&amp;nbsp;these vulnerabilities at their source, reducing the likelihood that low-level software flaws can be exploited to compromise devices or disrupt workloads, supporting more trustworthy infrastructure by design. &lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;An open and certified foundation for memory-safe embedded systems&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559739&amp;quot;:0,&amp;quot;335559740&amp;quot;:300}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;CHERIoT-Ibex is the first open-source production-quality implementation of the&amp;nbsp;CHERIoT&amp;nbsp;instruction set architecture and among the first cores certified by the CHERI Alliance (&lt;/SPAN&gt;&lt;A href="https://cheri-alliance.org/cheri-enabled/cheriot/" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;CHERI Alliance – CHERIoT&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;). CHERIoT is an extension of&amp;nbsp;the&amp;nbsp;CHERI&amp;nbsp;(Capability Hardware Enhanced RISC Instructions)&amp;nbsp;instruction&amp;nbsp;set,&amp;nbsp;with&amp;nbsp;a focus on embedded and Internet of Things (IoT) applications.&amp;nbsp;Ibex is&amp;nbsp;an&amp;nbsp;open&lt;/SPAN&gt;‑&lt;SPAN data-contrast="auto"&gt;source&amp;nbsp;32&lt;/SPAN&gt;‑&lt;SPAN data-contrast="auto"&gt;bit RISC&lt;/SPAN&gt;‑&lt;SPAN data-contrast="auto"&gt;V core developed by&amp;nbsp;LowRISC.&amp;nbsp;CHERIoT&lt;/SPAN&gt;‑&lt;SPAN data-contrast="auto"&gt;Ibex&amp;nbsp;builds on Ibex by including CHERIoT capability extensions to provide&amp;nbsp;hardware&lt;/SPAN&gt;‑&lt;SPAN data-contrast="auto"&gt;enforced&amp;nbsp;memory safety and&amp;nbsp;fine&lt;/SPAN&gt;‑&lt;SPAN data-contrast="auto"&gt;grained&amp;nbsp;compartmentalization.&amp;nbsp;It&amp;nbsp;is the result of a close partnership between Microsoft Research and Azure Hardware Systems &amp;amp; Infrastructure, combining advanced research innovation with industry-leading silicon IP development expertise. &lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;In 2023, Microsoft&amp;nbsp;open-sourced&amp;nbsp;the CHERIoT&amp;nbsp;Platform&amp;nbsp;to bring hardware-enforced memory safety to embedded systems, including an instruction set architecture, toolchain, real-time operating system, and the&amp;nbsp;RTL implementation of the CHERIoT-Ibex core. The CHERI Alliance certification recognizes its ability to provide spatial and temporal memory safety, fine-grained compartmentalization, and compatibility with the broader CHERI ecosystem. Critically, CHERIoT-Ibex achieves these security guarantees with power and area efficiency comparable to low-cost microcontrollers,&amp;nbsp;demonstrating&amp;nbsp;that security&amp;nbsp;doesn’t&amp;nbsp;have to come at a premium. &lt;/SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Why memory safety&amp;nbsp;remains&amp;nbsp;a&amp;nbsp;foundational security challenge&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Traditional embedded and&amp;nbsp;microcontroller-class&amp;nbsp;designs rely on software hardening and&amp;nbsp;coarse-grained&amp;nbsp;hardware protections that struggle to prevent attacks such as buffer overflows and&amp;nbsp;use-after-free&amp;nbsp;vulnerabilities, often adding complexity while still leaving gaps in protection. &lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Consider a controller that runs privileged firmware responsible for device initialization, telemetry, and system health monitoring, while also hosting networking functionality exposed to external inputs. A&amp;nbsp;memory-safe&amp;nbsp;vulnerability in the networking stack could allow attackers to execute unauthorized code within the firmware environment, potentially affecting other critical services on the device. In tightly integrated&amp;nbsp;systems,&amp;nbsp;these failures can propagate beyond a single&amp;nbsp;component, increasing overall risk.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;BR /&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Constraining failures with&amp;nbsp;hardware-enforced&amp;nbsp;isolation&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;CHERIoT-Ibex&amp;nbsp;enables&amp;nbsp;hardware-enforced&amp;nbsp;isolation between these components, helping ensure that even if the networking stack is compromised, its ability to&amp;nbsp;impact&amp;nbsp;system initialization or telemetry functions&amp;nbsp;remains&amp;nbsp;constrained. By limiting the blast radius of software failures,&amp;nbsp;CHERIoT-Ibex&amp;nbsp;supports a system-level approach to security rather than relying on individual components to defend themselves in isolation.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;BR /&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;Advancing&amp;nbsp;memory-safe&amp;nbsp;infrastructure by design&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;201341983&amp;quot;:0,&amp;quot;335559740&amp;quot;:300}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;CHERIoT-Ibex’s certification by the CHERI Alliance marks an important milestone for open-source&amp;nbsp;memory-safe solutions. It&amp;nbsp;validates&amp;nbsp;that strong security guarantees can coexist with efficiency and transparency, reflecting Microsoft’s broader silicon-to-systems strategy&amp;nbsp;of&amp;nbsp;embedding&amp;nbsp;security into&amp;nbsp;the&amp;nbsp;foundational hardware infrastructure.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Explore and engage with the open-source CHERIoT ecosystem by visiting the CHERIoT Platform and the CHERIoT-Ibex GitHub repository (&lt;/SPAN&gt;&lt;A href="https://github.com/microsoft/cheriot-ibex" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;microsoft/cheriot-ibex&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;). The repositories enable developers and researchers to experiment with, contribute to, and&amp;nbsp;build on&amp;nbsp;memory-safe hardware and software foundations. &lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 09 May 2026 05:08:11 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/cheriot-ibex-closing-the-door-on-memory-safety-vulnerabilities/ba-p/4517904</guid>
      <dc:creator>kunyanliu</dc:creator>
      <dc:date>2026-05-09T05:08:11Z</dc:date>
    </item>
    <item>
      <title>Safely Migrating Terraform Managed Disks on Azure Using Stable Keys and Copilot</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/safely-migrating-terraform-managed-disks-on-azure-using-stable/ba-p/4517509</link>
      <description>&lt;H2&gt;&lt;SPAN style="color: rgb(30, 30, 30); font-size: 32px;"&gt;The Root Cause: Index-Based "for_each" Keys:&lt;/SPAN&gt;&lt;/H2&gt;
&lt;H4&gt;Many Terraform modules flatten VM and disk definitions into a list and use the list index as the for_each key:&lt;/H4&gt;
&lt;H4&gt;&lt;STRONG&gt;for_each = { for index, sp in local.managed_disks : index =&amp;gt; sp }&lt;/STRONG&gt;&lt;/H4&gt;
&lt;H4&gt;This pattern looks harmless, but the index is not stable:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Adding a disk to one VM shifts downstream indices&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Reordering environment JSON changes flatten order&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Terraform treats shifted indices as new resources&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;The result: Terraform plans to destroy and recreate all affected managed disks—even though nothing changed in Azure.&lt;/H4&gt;
&lt;H2&gt;Why This Is Especially Risky on Azure:&lt;/H2&gt;
&lt;H4&gt;Azure managed disks are often:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Attached to stateful application tiers&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Used for databases, middleware, or batch workloads&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Deployed across zones for resiliency&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;A forced disk replacement can mean:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Data loss&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Extended outages&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Failed change windows&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;This makes state stability a first-class design concern—not an implementation detail.&lt;/H4&gt;
&lt;H2&gt;The Stable Key Pattern:&lt;/H2&gt;
&lt;H4&gt;The fix is conceptually simple: use a domain-stable identifier for each disk.&lt;/H4&gt;
&lt;H4&gt;A proven pattern is:&lt;/H4&gt;
&lt;H4&gt;&lt;STRONG&gt;"${sp.vm}-${sp.data_disk.lun}"&lt;/STRONG&gt;&lt;/H4&gt;
&lt;H4&gt;This key is:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Deterministic&lt;/LI&gt;
&lt;LI&gt;Independent of ordering&lt;/LI&gt;
&lt;LI&gt;Human-readable&lt;/LI&gt;
&lt;LI&gt;Stable across environments&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Example:&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;VM&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;LUN&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Stable Key&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;vm1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;vm1-0&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;vm1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;vm1-1&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;vm2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;0&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;vm2-0&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4&gt;Once applied, adding a new disk results in exactly one new resource, with zero churn.&lt;/H4&gt;
&lt;H2&gt;The Migration Challenge: Terraform State:&lt;/H2&gt;
&lt;H4&gt;Changing &lt;STRONG&gt;for_each&lt;/STRONG&gt; keys alone is not enough.&lt;/H4&gt;
&lt;H4&gt;Terraform tracks resources by their state address, not by Azure resource ID. When keys change, Terraform believes:&lt;/H4&gt;
&lt;H4&gt;&lt;STRONG&gt;“The old disks were deleted, and new ones must be created.”&lt;/STRONG&gt;&lt;/H4&gt;
&lt;H4&gt;To prevent this, we must move the state, not recreate the resource.&lt;/H4&gt;
&lt;H4&gt;That is where terraform state mv comes in.&lt;/H4&gt;
&lt;H2&gt;Automating the Migration with GitHub Copilot Skills:&lt;/H2&gt;
&lt;H4&gt;To remove risk and human error, the team created a reusable Copilot skill for managed disk key migration.&lt;/H4&gt;
&lt;H3&gt;What the Skill Does:&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Inspects Terraform modules for index-based for_each&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Reads environment JSON files (such as ALZ variable abstractions)&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Reconstructs the exact flatten order used by Terraform&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Generates precise terraform state mv commands&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;This ensures:&lt;/H5&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;No guessing&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;No manual address mapping&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;No production surprises&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;The skill is stored directly inside the repository under .github/skills, making it:&lt;/H5&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Discoverable&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Versioned&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Shareable across teams&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Example: Generating State Move Commands:&lt;/H2&gt;
&lt;H4&gt;Based on environment JSON, Copilot can generate commands like:&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;terraform state mv \&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; 'module.managed_disk_windowsvm_app["0"]' \&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; 'module.managed_disk_windowsvm_app["vm1-0"]'&lt;/STRONG&gt;&lt;/P&gt;
&lt;H4&gt;This is repeated deterministically for every existing disk—before any plan or apply.&lt;/H4&gt;
&lt;H2&gt;Recommended Migration Workflow:&lt;/H2&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;H5&gt;Confirm clean state&lt;/H5&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;terraform plan shows no pending changes&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;H5&gt;Update the module&lt;/H5&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Replace index-based keys with stable keys&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;H4&gt;Back up the state&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Especially critical with remote backends (Azure Storage)&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;H4&gt;Run terraform state mv&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Only after terraform init is connected to the correct backend&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;H4&gt;Re-run plan&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Existing disks should show no changes&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;H4&gt;Add new disks safely&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Terraform creates only the new disk&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2&gt;CI/CD and Remote Backend Considerations:&lt;/H2&gt;
&lt;H4&gt;A critical finding from this migration:&lt;/H4&gt;
&lt;H4&gt;&lt;STRONG&gt;terraform state mv&lt;/STRONG&gt; always modifies the currently initialized backend.&lt;/H4&gt;
&lt;H4&gt;In pipeline-driven environments:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Ensure the correct environment is initialized&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Run migrations once per environment&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Never merge stable-key code before migrating all environments&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Failing to align code and state can cause disk destruction in production.&lt;/H4&gt;
&lt;H2&gt;Key Takeaways:&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;Index-based for_each keys are unsafe for long-lived Azure disks&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Stable keys such as vm-lun eliminate accidental resource churn&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;State migration is mandatory—not optional&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Copilot skills are powerful for institutionalizing safe patterns&lt;/EM&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;EM&gt;Small Terraform design choices can have enterprise-scale impact&lt;/EM&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Closing Thoughts:&lt;/H2&gt;
&lt;H4&gt;This pattern is broadly applicable beyond disks—to NICs, extensions, and any resource where identity must outlive ordering.&lt;/H4&gt;
&lt;H4&gt;By combining:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;&lt;EM&gt;Stable Terraform design&lt;/EM&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;EM&gt;State-aware migrations&lt;/EM&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;EM&gt;GitHub Copilot automation&lt;/EM&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;Teams can make infrastructure changes boring again—and that is the ultimate reliability goal.&lt;/H4&gt;</description>
      <pubDate>Fri, 08 May 2026 16:02:01 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/safely-migrating-terraform-managed-disks-on-azure-using-stable/ba-p/4517509</guid>
      <dc:creator>shwetayadav</dc:creator>
      <dc:date>2026-05-08T16:02:01Z</dc:date>
    </item>
    <item>
      <title>Building Secure AI Platforms in Banking Using Azure Enterprise Architecture</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-secure-ai-platforms-in-banking-using-azure-enterprise/ba-p/4517531</link>
      <description>&lt;H4&gt;&lt;STRONG&gt;1. Introduction: AI in Banking Is Not Just a Model Problem&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Modern banking institutions are no longer asking &lt;EM data-start="1449" data-end="1467"&gt;“Can we use AI?”&lt;/EM&gt;&lt;BR data-start="1467" data-end="1470" /&gt;The real question is:&lt;BR /&gt;&lt;STRONG data-start="1496" data-end="1587"&gt;“Can we use AI without violating regulatory, security, and data residency constraints?”&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Unlike public AI applications, banking systems must ensure:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No public internet exposure&lt;/LI&gt;
&lt;LI&gt;Strict identity-based access control&lt;/LI&gt;
&lt;LI&gt;End-to-end auditability&lt;/LI&gt;
&lt;LI&gt;Data residency compliance&lt;/LI&gt;
&lt;LI&gt;Fully controlled inference pipelines&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;👉 In enterprise environments, &lt;STRONG&gt;AI success is driven by secure infrastructure—not just model accuracy&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;2. Core Design Principle: Controlled Intelligence System&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Every AI request must follow a &lt;STRONG&gt;security-enforced execution pipeline&lt;/STRONG&gt;:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;User Request
   ↓
Secure Edge (Application Gateway + WAF)
   ↓
API Governance Layer (API Management - Internal Mode)
   ↓
AI Orchestration Layer (AKS / App Services)
   ↓
Retrieval + Policy Layer (RAG + Guardrails)
   ↓
Private AI Services (Azure OpenAI)
   ↓
Observability Layer (AMPLS)
   ↓
Final Response&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Key Insight:&lt;/STRONG&gt;&lt;BR /&gt;This is not just an architecture—it is a &lt;STRONG&gt;controlled and auditable execution model&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;3. Azure Enterprise AI Architecture (Production-Ready Pattern)&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;A real-world architecture used in banking environments:&lt;/P&gt;
&lt;img /&gt;
&lt;H4&gt;&lt;STRONG&gt;4. Private Connectivity Model (Critical for Compliance)&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Key components:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Private Endpoints&lt;/STRONG&gt; → Secure PaaS isolation&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Private DNS Zones&lt;/STRONG&gt; → Controlled name resolution&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;VNet Integration&lt;/STRONG&gt; → Internal service communication&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Firewall&lt;/STRONG&gt; → Traffic inspection and control&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;⚠️ &lt;STRONG&gt;Common Production Failure:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;AKS pods fail to resolve Azure OpenAI private endpoint&lt;/LI&gt;
&lt;LI&gt;Root cause:
&lt;UL&gt;
&lt;LI&gt;Missing Private DNS links&lt;/LI&gt;
&lt;LI&gt;Incorrect VNet configuration&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;👉 This is one of the most frequent failures in enterprise AI deployments.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt; “Debugging Private Endpoint Failures” &lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-start="2681" data-end="2689"&gt;Include:&lt;/P&gt;
&lt;UL data-start="2691" data-end="2800"&gt;
&lt;LI data-start="2691" data-end="2717"&gt;nslookup behavior in AKS&lt;/LI&gt;
&lt;LI data-start="2718" data-end="2742"&gt;DNS zone linking check&lt;/LI&gt;
&lt;LI data-start="2743" data-end="2772"&gt;VNet integration validation&lt;/LI&gt;
&lt;LI data-start="2773" data-end="2800"&gt;UDR / Firewall inspection&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;&lt;STRONG&gt;5. Identity-First Security Model (No Secrets Architecture)&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Modern banking architectures eliminate static credentials entirely.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Authentication Flow:&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;AKS Workload → Managed Identity → Azure AD → Azure Services&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Key Principle:&lt;/STRONG&gt;&lt;BR /&gt;👉 &lt;EM&gt;Identity is the new security perimeter.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Benefits:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No API keys or secrets&lt;/LI&gt;
&lt;LI&gt;Simplified access management&lt;/LI&gt;
&lt;LI&gt;RBAC-based governance&lt;/LI&gt;
&lt;LI&gt;Fully auditable access&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;&lt;STRONG&gt;6. Secure AI Inference Pipeline&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;A production AI request flow:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;def process_request(user_request):
    # 1. Authenticate user via Azure AD
    identity = authenticate_aad(user_request.token)
    if not identity or not identity.is_valid:
        return "ACCESS_DENIED"
    # 2. Enforce rate limiting per identity
    if not rate_limit(identity):
        return "RATE_LIMIT_EXCEEDED"
    # 3. Apply prompt security guardrails (injection protection)
    safe_prompt = apply_prompt_guardrails(user_request.prompt)
    # 4. Content safety filtering (PII / harmful content detection)
    if not content_filter(safe_prompt):
        return "CONTENT_BLOCKED"
    # 5. Retrieve secure RAG context
    context = retrieve_rag_context(
        query=safe_prompt,
        secure_mode=True
    )
    # 6. Build final prompt
    final_prompt = merge_prompt_and_context(safe_prompt, context)
    # 7. Call Azure OpenAI with circuit breaker protection
    response = circuit_breaker(
        lambda: call_openai(
            prompt=final_prompt,
            identity=ManagedIdentity()
        )
    )
    # 8. Validate and sanitize model output
    validated_output = sanitize(response)
    # 9. Log everything for audit + compliance (AMPLS / SIEM)
    log_to_ampls(
        identity=identity,
        request=user_request,
        response=validated_output
    )
    return validated_output&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Security controls include:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Prompt injection filtering&lt;/LI&gt;
&lt;LI&gt;Context grounding (RAG)&lt;/LI&gt;
&lt;LI&gt;Output sanitization&lt;/LI&gt;
&lt;LI&gt;Full audit logging&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;&lt;STRONG&gt;7. RAG Architecture: Enterprise AI Backbone&lt;/STRONG&gt;&lt;/H4&gt;
&lt;LI-CODE lang=""&gt;User Query → Embedding Model → Azure AI Search (Vector Store) → Context Retrieval → Azure OpenAI → Final Response&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Why RAG is preferred in banking:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No model retraining required&lt;/LI&gt;
&lt;LI&gt;Controlled data exposure&lt;/LI&gt;
&lt;LI&gt;Easier compliance validation&lt;/LI&gt;
&lt;LI&gt;Real-time knowledge updates&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;In banking systems, retrieval is not just about relevance—it is about &lt;STRONG data-start="3993" data-end="4039"&gt;controlled disclosure of sensitive context&lt;/STRONG&gt;&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;8. Observability with AMPLS (A Critical Yet Overlooked Layer)&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;AI telemetry flows through:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;Azure Services → Private Link → AMPLS → Log Analytics / App Insights&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Why this matters:&lt;/STRONG&gt; Logs may contain:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Sensitive financial data&lt;/LI&gt;
&lt;LI&gt;PII&lt;/LI&gt;
&lt;LI&gt;Prompt inputs&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;👉 AMPLS ensures &lt;STRONG&gt;telemetry remains private and compliant&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;9. Regulatory Mapping: Banking Requirements to Azure Capabilities&lt;/STRONG&gt;&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Requirement&lt;/th&gt;&lt;th&gt;Azure Implementation&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;No public exposure&lt;/td&gt;&lt;td&gt;Private Endpoints&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Identity-based security&lt;/td&gt;&lt;td&gt;Azure AD + Managed Identity&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Audit compliance&lt;/td&gt;&lt;td&gt;Log Analytics + AMPLS&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Data protection&lt;/td&gt;&lt;td&gt;Customer-Managed Keys (CMK)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Network isolation&lt;/td&gt;&lt;td&gt;VNet + Firewall&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Access governance&lt;/td&gt;&lt;td&gt;RBAC + PIM&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;
&lt;H4&gt;&lt;STRONG&gt;10. Real-World Production Challenges&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Common failure points in enterprise AI systems:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;DNS Misconfiguration&lt;/STRONG&gt; – Private endpoints fail resolution&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Latency Chains&lt;/STRONG&gt; – Excessive service hops&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;OpenAI Rate Limits&lt;/STRONG&gt; – High enterprise load&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Identity Propagation Issues&lt;/STRONG&gt; – Cross-subscription failures&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Observability Gaps&lt;/STRONG&gt; – Missing distributed tracing&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;11. Enterprise Architecture Best Practices&lt;/STRONG&gt;&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Design with &lt;STRONG&gt;zero-trust principles&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Treat AI as a &lt;STRONG&gt;distributed system&lt;/STRONG&gt;, not a single component&lt;/LI&gt;
&lt;LI&gt;Centralize governance using API Management&lt;/LI&gt;
&lt;LI&gt;Never expose AI services publicly&lt;/LI&gt;
&lt;LI&gt;Use &lt;STRONG&gt;identity everywhere—no secrets&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Separate:
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Control Plane&lt;/STRONG&gt; (governance)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Data Plane&lt;/STRONG&gt; (inference execution)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;&lt;STRONG&gt;12. Azure Service Mapping (Quick Reference)&lt;/STRONG&gt;&lt;/H4&gt;
&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Layer&lt;/th&gt;&lt;th&gt;Azure Services&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Edge Security&lt;/td&gt;&lt;td&gt;Application Gateway (WAF)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;API Layer&lt;/td&gt;&lt;td&gt;API Management&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Compute&lt;/td&gt;&lt;td&gt;AKS / App Services&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AI Services&lt;/td&gt;&lt;td&gt;Azure OpenAI&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Retrieval&lt;/td&gt;&lt;td&gt;Azure AI Search&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Data&lt;/td&gt;&lt;td&gt;Azure Storage / SQL&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Identity&lt;/td&gt;&lt;td&gt;Azure AD + Managed Identity&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Networking&lt;/td&gt;&lt;td&gt;Private Link + VNet&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Observability&lt;/td&gt;&lt;td&gt;AMPLS + Log Analytics&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;
&lt;H4&gt;&lt;STRONG&gt;13. Common Failure Patterns&lt;/STRONG&gt;&lt;/H4&gt;
&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Issue&lt;/th&gt;&lt;th&gt;Root Cause&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AI endpoint unreachable&lt;/td&gt;&lt;td&gt;DNS / Private endpoint misconfig&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Data leakage risk&lt;/td&gt;&lt;td&gt;Missing prompt filtering&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;High latency&lt;/td&gt;&lt;td&gt;Over-layered architecture&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Unauthorized access&lt;/td&gt;&lt;td&gt;Identity misconfiguration&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Poor response quality&lt;/td&gt;&lt;td&gt;Weak RAG implementation&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;
&lt;H4&gt;&lt;STRONG&gt;14. Final Thought&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;In enterprise banking AI systems:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Models are replaceable. Architecture is not.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The real challenge is designing a system where AI is:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Secure&lt;/LI&gt;
&lt;LI&gt;Controlled&lt;/LI&gt;
&lt;LI&gt;Observable&lt;/LI&gt;
&lt;LI&gt;Fully compliant&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 07 May 2026 16:03:01 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-secure-ai-platforms-in-banking-using-azure-enterprise/ba-p/4517531</guid>
      <dc:creator>divyanshi_varshney</dc:creator>
      <dc:date>2026-05-07T16:03:01Z</dc:date>
    </item>
    <item>
      <title>How Validation‑Driven Terraform Made Our Azure Function Deployments Predictable</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/how-validation-driven-terraform-made-our-azure-function/ba-p/4517375</link>
      <description>&lt;P&gt;When Terraform deploys Azure Functions, the most expensive failures are rarely “syntax” problems. They’re environmental mismatches discovered too late—during&amp;nbsp;&lt;STRONG&gt;terraform apply&lt;/STRONG&gt;, after approvals, after a change window starts, and often after multiple teams are already watching the pipeline.&lt;/P&gt;
&lt;P&gt;After a few painful production-grade rollouts, we shifted to a &lt;STRONG&gt;validation-driven&lt;/STRONG&gt; approach: instead of letting Azure reject misconfigurations at apply time, we &lt;STRONG&gt;fail fast at PR/plan time&lt;/STRONG&gt; with clear messages that engineers can fix immediately.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What we mean by “validation‑driven”&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Validation-driven Terraform uses three guardrails together:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;PR checks&lt;/STRONG&gt;: formatting, linting, security scanning, module contract tests&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Pre-flight checks&lt;/STRONG&gt;: quick Azure sanity checks (provider registration, storage prerequisites, RBAC basics)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Terraform-native validations&lt;/STRONG&gt;: input validations + preconditions that stop invalid configurations before they reach Azure&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The idea is simple: &lt;STRONG&gt;apply should be boring&lt;/STRONG&gt;. If something is going to fail, it should fail &lt;EM&gt;earlier&lt;/EM&gt; with &lt;EM&gt;better&lt;/EM&gt; errors.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Azure Functions as the example: where failures actually happen&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Azure Functions bring a few recurring “gotchas” that tend to show up late:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;mismatch between &lt;STRONG&gt;plan/SKU&lt;/STRONG&gt; and features (e.g., VNET integration expectations)&lt;/LI&gt;
&lt;LI&gt;missing or inaccessible &lt;STRONG&gt;storage account settings&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;unsupported/incorrect &lt;STRONG&gt;runtime stack/version&lt;/STRONG&gt; for a chosen hosting model/region/policy&lt;/LI&gt;
&lt;LI&gt;inconsistent &lt;STRONG&gt;app settings&lt;/STRONG&gt; required by your org platform standards&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;We converted these into guardrails with &lt;STRONG&gt;minimal code&lt;/STRONG&gt; and clearer pipeline signals.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Case study 1: Wrong plan SKU causing runtime capability failures&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Problem&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;A team deployed a Function App expecting network integration behavior, but the selected plan/SKU didn’t align with what the workload required. The pipeline failed late, after approvals, and the rollback discussion took longer than the fix.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What we changed&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We added a small validation/precondition rule: if a team enables a capability that requires a certain class of plan, Terraform fails early with a targeted message.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Outcome&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Failure moved from apply-time → plan-time&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;2+ hours&lt;/STRONG&gt; saved per failed deployment cycle&lt;/LI&gt;
&lt;LI&gt;Zero repeat incidents for that class of issue&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Case study 2: Missing storage configuration blocking deployments&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Problem&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Function Apps depend heavily on storage configuration. We saw intermittent failures when storage settings pointed to deleted/incorrect resources, or when access expectations didn’t match reality.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What we changed&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We introduced a &lt;STRONG&gt;pre-flight check&lt;/STRONG&gt; step in Azure DevOps: verify storage existence/access and fail fast before plan/apply.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Outcome&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Deployments stopped failing mid-run&lt;/LI&gt;
&lt;LI&gt;Fewer “investigation loops” across teams&lt;/LI&gt;
&lt;LI&gt;Reduced incident noise (the pipeline became self-explanatory)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Case study 3: Unsupported runtime version (region/org guardrails mismatch)&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Problem&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Engineers selected a runtime stack/version that was valid in isolation, but not aligned with platform support or readiness in the target environment. Failures appeared in apply or after release.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;What we changed&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We centralized an “allowed runtime list” (per org standards) and validated runtime inputs at plan time.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Outcome&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Plan failed fast with clear explanation&lt;/LI&gt;
&lt;LI&gt;No redeploy cycles&lt;/LI&gt;
&lt;LI&gt;Better compliance posture (standards became enforceable code)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Why this matters (beyond “fewer red pipelines”)&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Validation-driven Terraform improved more than deployment success rate:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Developer experience&lt;/STRONG&gt;: errors became precise and actionable&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Operational safety&lt;/STRONG&gt;: fewer emergency approvals and late-night fixes&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Standardization&lt;/STRONG&gt;: platform rules stopped living in tribal knowledge and wikis&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Manager-visible impact&lt;/STRONG&gt;: less delivery friction, fewer escalations, faster releases&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The best part: this wasn’t achieved by writing massive frameworks. It was achieved by adding &lt;STRONG&gt;small, high-leverage validations&lt;/STRONG&gt; in the right places.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Minimal code approach (what we actually used)&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We intentionally kept Terraform code small:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A few &lt;STRONG&gt;input validations&lt;/STRONG&gt; (environment, runtime, naming contracts)&lt;/LI&gt;
&lt;LI&gt;A few &lt;STRONG&gt;preconditions&lt;/STRONG&gt; (must-have settings and plan constraints)&lt;/LI&gt;
&lt;LI&gt;A light &lt;STRONG&gt;Azure DevOps pre-flight&lt;/STRONG&gt; step for checks Terraform can’t reliably infer (like “does this dependency exist and is it accessible?”)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This way, the module stays readable, and the pipeline remains fast.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Conclusion&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;In one of our Azure Function platforms, repeated deployment failures were not caused by bugs in application code or gaps in Terraform itself—they were caused by &lt;STRONG&gt;discovering platform constraints too late&lt;/STRONG&gt;. Each failed terraform apply triggered rework, additional approvals, and unnecessary operational noise across teams.&lt;/P&gt;
&lt;P&gt;By introducing a validation‑driven approach—combining Terraform input validations, targeted preconditions, and lightweight Azure DevOps pre‑flight checks—we moved failure discovery to the right place: &lt;STRONG&gt;pull requests and plan stages&lt;/STRONG&gt;. Azure Function‑specific issues such as incorrect plan capabilities, unsupported runtimes, and missing storage prerequisites were surfaced early, with clear, actionable messages.&lt;/P&gt;
&lt;P&gt;If your Azure DevOps pipelines still use terraform apply as a discovery mechanism, validation is not an optimization—it’s a foundational platform capability.&lt;/P&gt;</description>
      <pubDate>Thu, 07 May 2026 05:12:06 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/how-validation-driven-terraform-made-our-azure-function/ba-p/4517375</guid>
      <dc:creator>AkshitaBajpai</dc:creator>
      <dc:date>2026-05-07T05:12:06Z</dc:date>
    </item>
    <item>
      <title>🚀 Modernizing Azure Landing Zone Deployments: From Terraform Scripts to GHCP-Driven Engineering</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/modernizing-azure-landing-zone-deployments-from-terraform/ba-p/4517168</link>
      <description>&lt;P&gt;&lt;STRONG&gt;The code nobody wants to write anymore&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Most Azure Landing Zone (ALZ) deployments we see today still follow the same pattern they did a few years ago. Engineers manually write Terraform modules, copy networking blocks from older repos, tweak management group hierarchies, and wire pipelines step by step.&lt;/P&gt;
&lt;P&gt;It works. But it’s slow, repetitive, and heavily dependent on individual expertise.&lt;/P&gt;
&lt;P&gt;The real problem isn’t complexity. It’s &lt;STRONG&gt;friction&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Every new Landing Zone starts with:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Rebuilding Terraform structures&lt;/LI&gt;
&lt;LI&gt;Rewriting GitHub Actions pipelines&lt;/LI&gt;
&lt;LI&gt;Re-validating OIDC authentication&lt;/LI&gt;
&lt;LI&gt;Re-implementing policy assignments&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;And despite all that effort, most implementations still look… slightly different.&lt;/P&gt;
&lt;P&gt;This is exactly where &lt;STRONG&gt;GitHub Copilot (GHCP)&lt;/STRONG&gt; changes the game.&lt;/P&gt;
&lt;H2 data-section-id="1uvjs1v" data-start="919" data-end="966"&gt;Instead of writing infrastructure line by line, we now &lt;STRONG&gt;describe intent&lt;/STRONG&gt; — and let AI generate the implementation.&lt;/H2&gt;
&lt;H2 data-section-id="1agdkld" data-start="1278" data-end="1335"&gt;From Infrastructure as Code → Infrastructure by Prompt&lt;/H2&gt;
&lt;P data-start="1337" data-end="1406"&gt;The biggest shift isn’t Terraform. It’s how we &lt;EM data-start="1384" data-end="1392"&gt;arrive&lt;/EM&gt; at Terraform.&lt;/P&gt;
&lt;P data-start="1408" data-end="1415"&gt;Before:&lt;/P&gt;
&lt;img /&gt;
&lt;P data-start="1072" data-end="1090"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1072" data-end="1090"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1072" data-end="1090"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1072" data-end="1090"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1072" data-end="1090"&gt;We now start with&lt;/P&gt;
&lt;img /&gt;
&lt;P data-start="1300" data-end="1318"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1300" data-end="1318"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1300" data-end="1318"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1300" data-end="1318"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1300" data-end="1318"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="1723" data-end="1776"&gt;That prompt doesn’t just generate code. It generates:&lt;/P&gt;
&lt;UL data-start="1777" data-end="1871"&gt;
&lt;LI data-section-id="oc99q6" data-start="1777" data-end="1803"&gt;Architecture decisions&lt;/LI&gt;
&lt;LI data-section-id="1h39nnr" data-start="1804" data-end="1824"&gt;Module structure&lt;/LI&gt;
&lt;LI data-section-id="1vhjdtm" data-start="1825" data-end="1847"&gt;Naming conventions&lt;/LI&gt;
&lt;LI data-section-id="1fc7x0q" data-start="1848" data-end="1871"&gt;Deployment patterns&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1873" data-end="1919"&gt;The engineer moves from &lt;STRONG data-start="1897" data-end="1918"&gt;author → reviewer&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-section-id="1hakbxc" data-start="1926" data-end="1973"&gt;How GHCP actually accelerates ALZ deployment&lt;/H2&gt;
&lt;P data-start="1975" data-end="2045"&gt;This isn’t about autocomplete. It’s about &lt;STRONG data-start="2017" data-end="2044"&gt;end-to-end acceleration&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-start="2047" data-end="2112"&gt;Let’s break it down the way it actually happens in real projects.&lt;/P&gt;
&lt;H2 data-section-id="1jrblyd" data-start="2119" data-end="2174"&gt;1. Designing the Landing Zone (in minutes, not days)&lt;/H2&gt;
&lt;P data-start="2176" data-end="2268"&gt;The traditional design phase involves whiteboarding, documentation, and multiple iterations.&lt;/P&gt;
&lt;P data-start="2270" data-end="2295"&gt;With GHCP, we start with:&lt;/P&gt;
&lt;P data-start="2270" data-end="2295"&gt;🔹 Prompt&lt;/P&gt;
&lt;img /&gt;
&lt;H3 data-section-id="3l1s2e" data-start="1684" data-end="1706"&gt;&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H6 data-section-id="3l1s2e" data-start="1684" data-end="1706"&gt;✅ Output from GHCP&lt;/H6&gt;
&lt;P data-start="2525" data-end="2541"&gt;What comes back:&lt;/P&gt;
&lt;UL data-start="2542" data-end="2653"&gt;
&lt;LI data-section-id="18ytj" data-start="2542" data-end="2565"&gt;A full MG hierarchy&lt;/LI&gt;
&lt;LI data-section-id="m4siw1" data-start="2566" data-end="2598"&gt;Suggested subscription model&lt;/LI&gt;
&lt;LI data-section-id="tmvx5l" data-start="2599" data-end="2629"&gt;Terraform module breakdown&lt;/LI&gt;
&lt;LI data-section-id="agsxh9" data-start="2630" data-end="2653"&gt;Governance baseline&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="2655" data-end="2741"&gt;You’re not starting from scratch anymore. You’re starting from a &lt;STRONG data-start="2720" data-end="2740"&gt;structured draft&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-section-id="1tnpvul" data-start="2748" data-end="2806"&gt;2. Generating Terraform Modules instead of writing them&lt;/H2&gt;
&lt;P data-start="2808" data-end="2845"&gt;Instead of building modules manually:&lt;/P&gt;
&lt;P data-start="2808" data-end="2845"&gt;🔹 Prompt&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3 data-section-id="1lfzu7w" data-start="2102" data-end="2114"&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H3 data-section-id="1lfzu7w" data-start="2102" data-end="2114"&gt;&amp;nbsp;&lt;/H3&gt;
&lt;H6 data-section-id="1lfzu7w" data-start="2102" data-end="2114"&gt;✅ Output&lt;/H6&gt;
&lt;UL&gt;
&lt;LI&gt;Folder structure&lt;/LI&gt;
&lt;LI&gt;Input/output variables&lt;/LI&gt;
&lt;LI&gt;Reusable module patterns&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3103" data-end="3166"&gt;What used to take hours of setup becomes a &lt;STRONG data-start="3146" data-end="3165"&gt;review exercise&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-section-id="1jlec5i" data-start="3173" data-end="3213"&gt;3. Networking without copy-paste debt&lt;/H2&gt;
&lt;P data-start="3215" data-end="3268"&gt;Networking is where most ALZ implementations diverge.&lt;/P&gt;
&lt;P data-start="3270" data-end="3307"&gt;Instead of digging through old repos:&lt;/P&gt;
&lt;H6 data-section-id="3fejye" data-start="2312" data-end="2325"&gt;🔹 Prompt&lt;/H6&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H6 data-section-id="1lfzu7w" data-start="2518" data-end="2530"&gt;&amp;nbsp;&lt;/H6&gt;
&lt;H6 data-section-id="1lfzu7w" data-start="2518" data-end="2530"&gt;✅ Output&lt;/H6&gt;
&lt;P data-start="3458" data-end="3466"&gt;You get:&lt;/P&gt;
&lt;UL data-start="3467" data-end="3532"&gt;
&lt;LI data-section-id="15u47oz" data-start="3467" data-end="3495"&gt;Clean networking configuration&lt;/LI&gt;
&lt;LI data-section-id="ml0bc2" data-start="3496" data-end="3514"&gt;Hub definition&lt;/LI&gt;
&lt;LI data-section-id="1cl63yr" data-start="3515" data-end="3532"&gt;Routing setup&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3534" data-end="3583"&gt;More importantly, it’s &lt;STRONG data-start="3557" data-end="3582"&gt;consistent every time&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-section-id="14pvp2s" data-start="3590" data-end="3641"&gt;4. OIDC setup without documentation rabbit holes&lt;/H2&gt;
&lt;P data-start="3643" data-end="3733"&gt;OIDC is well documented. But stitching it together across Azure + GitHub still takes time.&lt;/P&gt;
&lt;P data-start="3643" data-end="3733"&gt;🔹 Prompt&lt;/P&gt;
&lt;img /&gt;
&lt;P data-start="3917" data-end="3924"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="3917" data-end="3924"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="3917" data-end="3924"&gt;&amp;nbsp;&lt;/P&gt;
&lt;H6 data-section-id="1lfzu7w" data-start="2518" data-end="2530"&gt;✅ Output&lt;/H6&gt;
&lt;UL data-start="3925" data-end="3996"&gt;
&lt;LI data-section-id="1a919e9" data-start="3925" data-end="3947"&gt;Exact CLI commands&lt;/LI&gt;
&lt;LI data-section-id="u2jvc9" data-start="3948" data-end="3974"&gt;Correct subject format&lt;/LI&gt;
&lt;LI data-section-id="30oi0q" data-start="3975" data-end="3996"&gt;Proper RBAC scope&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3998" data-end="4031"&gt;No guesswork. No trial-and-error.&lt;/P&gt;
&lt;H2 data-section-id="hkpljg" data-start="4038" data-end="4089"&gt;5. GitHub Actions workflows generated in seconds&lt;/H2&gt;
&lt;P data-start="4091" data-end="4159"&gt;Pipeline creation is one of the most repetitive tasks in ALZ setups.&lt;/P&gt;
&lt;P data-start="4091" data-end="4159"&gt;🔹 Prompt&lt;/P&gt;
&lt;img /&gt;
&lt;P data-start="3917" data-end="3924"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="3917" data-end="3924"&gt;&amp;nbsp;&lt;/P&gt;
&lt;H6 data-section-id="1lfzu7w" data-start="2518" data-end="2530"&gt;&amp;nbsp;&lt;/H6&gt;
&lt;H6 data-section-id="1lfzu7w" data-start="2518" data-end="2530"&gt;✅ Output&lt;/H6&gt;
&lt;UL data-start="4356" data-end="4462"&gt;
&lt;LI data-section-id="154t2ri" data-start="4356" data-end="4385"&gt;Fully functional workflow&lt;/LI&gt;
&lt;LI data-section-id="mvh7bn" data-start="4386" data-end="4429"&gt;Correct permissions (id-token: write)&lt;/LI&gt;
&lt;LI data-section-id="1h7hyig" data-start="4430" data-end="4462"&gt;Environment-based deployment&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4464" data-end="4523"&gt;What used to be boilerplate is now &lt;STRONG data-start="4499" data-end="4522"&gt;instant scaffolding&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-section-id="qmh3q6" data-start="4530" data-end="4586"&gt;6. Policy as Code without digging through definitions&lt;/H2&gt;
&lt;P data-start="4588" data-end="4644"&gt;Policy assignment is often delayed because it’s tedious.&lt;/P&gt;
&lt;P data-start="4588" data-end="4644"&gt;🔹 Prompt&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H6 data-section-id="1lfzu7w" data-start="2518" data-end="2530"&gt;&amp;nbsp;&lt;/H6&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H6 data-section-id="1lfzu7w" data-start="2518" data-end="2530"&gt;✅ Output&lt;/H6&gt;
&lt;UL data-start="4833" data-end="4912"&gt;
&lt;LI data-section-id="onxfh7" data-start="4833" data-end="4868"&gt;Ready-to-use policy assignments&lt;/LI&gt;
&lt;LI data-section-id="1drmbb5" data-start="4869" data-end="4893"&gt;Initiative structure&lt;/LI&gt;
&lt;LI data-section-id="1y0jzsf" data-start="4894" data-end="4912"&gt;Correct scopes&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4914" data-end="4954"&gt;Governance is no longer an afterthought.&lt;/P&gt;
&lt;H2 data-section-id="jdltoa" data-start="4961" data-end="5002"&gt;What actually changes in real projects&lt;/H2&gt;
&lt;P data-start="5004" data-end="5060"&gt;This isn’t theoretical. The impact shows up immediately.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Area&lt;/th&gt;&lt;th&gt;Before GHCP&lt;/th&gt;&lt;th&gt;After GHCP&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Design&lt;/td&gt;&lt;td&gt;Whiteboarding + docs&lt;/td&gt;&lt;td&gt;Prompt-driven&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Terraform&lt;/td&gt;&lt;td&gt;Manual authoring&lt;/td&gt;&lt;td&gt;AI-generated + reviewed&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Pipelines&lt;/td&gt;&lt;td&gt;Built from scratch&lt;/td&gt;&lt;td&gt;Generated instantly&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;OIDC setup&lt;/td&gt;&lt;td&gt;Trial &amp;amp; error&lt;/td&gt;&lt;td&gt;Prompt-guided&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Consistency&lt;/td&gt;&lt;td&gt;Varies per engineer&lt;/td&gt;&lt;td&gt;Standardized&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-start="5399" data-end="5459"&gt;The biggest gain isn’t speed. It’s &lt;STRONG data-start="5434" data-end="5458"&gt;consistency at scale&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-section-id="1fm8ajg" data-start="5466" data-end="5512"&gt;The new skill: Prompt Engineering for Cloud&lt;/H2&gt;
&lt;P data-start="5514" data-end="5601"&gt;GHCP doesn’t remove the need for expertise. It changes where that expertise is applied.&lt;/P&gt;
&lt;P data-start="5603" data-end="5614"&gt;Bad prompt:&lt;/P&gt;
&lt;img /&gt;
&lt;P data-start="5655" data-end="5667"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="5655" data-end="5667"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="5655" data-end="5667"&gt;Good prompt:&lt;/P&gt;
&lt;img /&gt;
&lt;P data-start="5875" data-end="5947"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="5875" data-end="5947"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="5875" data-end="5947"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="5875" data-end="5947"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="5875" data-end="5947"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="5875" data-end="5947"&gt;The quality of output is directly tied to the &lt;STRONG data-start="5921" data-end="5946"&gt;quality of the prompt&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-section-id="qqnz2k" data-start="5954" data-end="5993"&gt;What still matters (and always will)&lt;/H2&gt;
&lt;P data-start="5995" data-end="6036"&gt;Even with GHCP, some things don’t change:&lt;/P&gt;
&lt;UL data-start="6038" data-end="6199"&gt;
&lt;LI data-section-id="1uxcvud" data-start="6038" data-end="6083"&gt;You still validate architecture decisions&lt;/LI&gt;
&lt;LI data-section-id="nuimz8" data-start="6084" data-end="6130"&gt;You still review Terraform before applying&lt;/LI&gt;
&lt;LI data-section-id="1fnoj16" data-start="6131" data-end="6166"&gt;You still design RBAC carefully&lt;/LI&gt;
&lt;LI data-section-id="x7pbwd" data-start="6167" data-end="6199"&gt;You still enforce governance&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="6201" data-end="6243"&gt;GHCP accelerates the &lt;EM data-start="6222" data-end="6227"&gt;how&lt;/EM&gt;, not the &lt;EM data-start="6237" data-end="6242"&gt;why&lt;/EM&gt;.&lt;/P&gt;
&lt;H2 data-section-id="1x98aaj" data-start="6250" data-end="6277"&gt;Where this is going next&lt;/H2&gt;
&lt;P data-start="6279" data-end="6319"&gt;We’re already seeing GHCP extend beyond:&lt;/P&gt;
&lt;UL data-start="6321" data-end="6414"&gt;
&lt;LI data-section-id="62819u" data-start="6321" data-end="6345"&gt;Subscription vending&lt;/LI&gt;
&lt;LI data-section-id="1xx85fi" data-start="6346" data-end="6374"&gt;Multi-region deployments&lt;/LI&gt;
&lt;LI data-section-id="1ns64cq" data-start="6375" data-end="6394"&gt;Drift detection&lt;/LI&gt;
&lt;LI data-section-id="a05zcd" data-start="6395" data-end="6414"&gt;Cost governance&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="6416" data-end="6437"&gt;The pattern is clear:&lt;/P&gt;
&lt;P data-start="6440" data-end="6516"&gt;Infrastructure is no longer written first. It’s &lt;STRONG data-start="6488" data-end="6515"&gt;generated, then refined&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-section-id="wivlgk" data-start="6523" data-end="6537"&gt;Wrapping up&lt;/H2&gt;
&lt;P data-start="6539" data-end="6588"&gt;If you do one thing after reading this, try this:&lt;/P&gt;
&lt;P data-start="6590" data-end="6616"&gt;Open your repo and prompt:&lt;/P&gt;
&lt;img /&gt;
&lt;P data-start="6723" data-end="6742"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="6723" data-end="6742"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="6723" data-end="6742"&gt;Look at the output.&lt;/P&gt;
&lt;P data-start="6744" data-end="6791"&gt;That’s not a shortcut. That’s the new baseline.&lt;/P&gt;
&lt;P data-start="6798" data-end="6874"&gt;GHCP doesn’t replace Terraform.&lt;BR data-start="6829" data-end="6832" /&gt;It removes the friction around writing it.&lt;/P&gt;
&lt;P data-start="6876" data-end="6945"&gt;And in large-scale Azure environments, that’s the difference between:&lt;/P&gt;
&lt;UL data-start="6946" data-end="7000" data-is-last-node="" data-is-only-node=""&gt;
&lt;LI data-section-id="1rjfgn0" data-start="6946" data-end="6961"&gt;Moving fast&lt;/LI&gt;
&lt;LI data-section-id="z2cx2n" data-start="6962" data-end="7000" data-is-last-node=""&gt;And &lt;STRONG data-start="6968" data-end="7000" data-is-last-node=""&gt;moving consistently at scale&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 06 May 2026 06:05:20 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/modernizing-azure-landing-zone-deployments-from-terraform/ba-p/4517168</guid>
      <dc:creator>gurkirat</dc:creator>
      <dc:date>2026-05-06T06:05:20Z</dc:date>
    </item>
    <item>
      <title>Operationalizing Responsible AI in Microsoft Foundry within Enterprise Network Boundaries</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/operationalizing-responsible-ai-in-microsoft-foundry-within/ba-p/4516869</link>
      <description>&lt;H2&gt;Strategic Overview&lt;/H2&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-teams="true"&gt;Deploying &lt;STRONG&gt;Microsoft Foundry&lt;/STRONG&gt; within a VNet-integrated landing zone requires a thoughtful balance between innovation and enterprise-grade security, especially in highly regulated industries like banking. This architecture enforces Responsible AI (RAI) principles and robust content safety controls while aligning with stringent security and compliance requirements. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-teams="true"&gt;By adopting a dual-stream design - comprising an AI Platform Layer and a Data Integration Layer, you can decouple model orchestration from data ingestion, enabling flexibility and scalability. Leveraging private networking constructs such as VNets, subnets, NSGs, and controlled routing ensures that all AI workloads operate within secure boundaries, while seamless integration with services like Azure AI Search, Azure Cosmos DB, Azure SQL Database, and Azure Document Intelligence enhances data accessibility and intelligence. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-teams="true"&gt;Event-driven ingestion patterns powered by Azure Data Factory and Azure Event Grid further enable real-time responsiveness. At the same time, real-world constraints - such as IP range allowlisting for Microsoft Foundry and private networking limitations—must be carefully accounted for. Ultimately, this approach ensures a secure, compliant, and scalable foundation for enterprise AI adoption.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-teams="true"&gt;Below are the pointers that this blog focuses:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;Deploy Azure Microsoft Foundry in a &lt;STRONG&gt;VNet-integrated landing zone&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;Enforce &lt;STRONG&gt;Responsible AI (RAI) policies and content safety controls&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;Align AI architecture with &lt;STRONG&gt;enterprise (banking) security requirements&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;Implement a dual-stream architecture:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;&lt;STRONG&gt;AI Platform Layer&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Data Integration Layer&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;Use private networking with &lt;STRONG&gt;VNet, subnets, NSGs, and routing&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;Integrate with &lt;STRONG&gt;Azure AI Search, Cosmos DB, SQL, and Document Intelligence&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;Enable event-driven ingestion using &lt;STRONG&gt;Azure Data Factory and Event Grid&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;Account for real-world constraints:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;&lt;STRONG&gt;IP range allowlisting for Microsoft Foundry&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Private networking limitations&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;Design for &lt;STRONG&gt;secure, compliant, and scalable AI consumption&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Problem Statement&lt;/H2&gt;
&lt;P&gt;Operationalizing Responsible AI in enterprise environments requires more than defining policies.&lt;/P&gt;
&lt;P&gt;Key challenges include:&lt;/P&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Translating Responsible AI principles into &lt;STRONG&gt;enforceable platform controls&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Deploying AI services within &lt;STRONG&gt;private, enterprise-grade networks&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Managing &lt;STRONG&gt;network constraints and service limitations&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Ensuring consistent integration across:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;AI services&lt;/LI&gt;
&lt;LI&gt;Data pipelines&lt;/LI&gt;
&lt;LI&gt;Application layers&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Without a structured approach, AI platforms risk being &lt;STRONG&gt;non-compliant, insecure, or difficult to scale&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Goals&lt;/H2&gt;
&lt;P&gt;Design an AI landing zone that:&lt;/P&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Enforces &lt;STRONG&gt;Responsible AI at the platform level&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Enables &lt;STRONG&gt;secure model deployment and access&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Operates fully within &lt;STRONG&gt;private network boundaries&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Integrates &lt;STRONG&gt;AI and Data services seamlessly&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Provides &lt;STRONG&gt;governed and controlled AI consumption&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Architecture Overview&lt;/H2&gt;
&lt;P&gt;Structure the platform into four layers:&lt;/P&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;&lt;STRONG&gt;Network Layer&lt;/STRONG&gt; → VNet, subnets, NSGs, routing&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;AI Platform Layer&lt;/STRONG&gt; → Microsoft Foundry, models, RAI policies&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Data Layer&lt;/STRONG&gt; → ADF, SHIR, Event Grid&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Application Layer&lt;/STRONG&gt; → Function Apps, Web Apps&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;H2&gt;Microsoft Foundry Setup in Enterprise Context&lt;/H2&gt;
&lt;P&gt;Set up Azure Microsoft Foundry as the &lt;STRONG&gt;core AI platform layer&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;Key steps:&lt;/H3&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Create Microsoft&lt;STRONG&gt;&amp;nbsp;Foundry projects&lt;/STRONG&gt; to isolate use cases&lt;/LI&gt;
&lt;LI&gt;Deploy models within controlled project boundaries&lt;/LI&gt;
&lt;LI&gt;Restrict access using:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;VNet integration&lt;/LI&gt;
&lt;LI&gt;Private endpoints&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Disable public access wherever possible&lt;/LI&gt;
&lt;LI&gt;Integrate with supporting services:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Azure AI Search (retrieval)&lt;/LI&gt;
&lt;LI&gt;Cosmos DB / SQL (data storage)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Design Principle&lt;/H3&gt;
&lt;P&gt;Treat AI services as &lt;STRONG&gt;governed platform components&lt;/STRONG&gt;, not standalone resources.&lt;/P&gt;
&lt;H2&gt;Responsible AI Implementation&lt;/H2&gt;
&lt;H3&gt;1. RAI Policies&lt;/H3&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Define policies at the &lt;STRONG&gt;model interaction layer&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Configure controls for:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Output moderation&lt;/LI&gt;
&lt;LI&gt;Prompt handling&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Align policies with &lt;STRONG&gt;organizational compliance requirements&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;2. Content Safety&lt;/H3&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Integrate content safety as a &lt;STRONG&gt;mandatory runtime layer&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Ensure all model outputs pass through filtering before reaching applications&lt;/LI&gt;
&lt;/UL&gt;
&lt;img&gt;&lt;STRONG&gt; Content Safety Flow&lt;/STRONG&gt;&lt;/img&gt;
&lt;H3&gt;3. Model Governance&lt;/H3&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Control model deployment via Microsoft Foundry&lt;/LI&gt;
&lt;LI&gt;Restrict direct access to models&lt;/LI&gt;
&lt;LI&gt;Route all interactions through:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Function Apps&lt;/LI&gt;
&lt;LI&gt;API layers&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Handling VNet-Integrated Deployment Challenges&lt;/H2&gt;
&lt;P&gt;Enterprise deployments introduce constraints that must be addressed early.&lt;/P&gt;
&lt;H5&gt;Challenge 1: Microsoft Foundry VNet Integration&lt;/H5&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Microsoft Foundry requires &lt;STRONG&gt;careful network planning&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Standard enterprise patterns may not work without validation&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;Challenge 2: IP Range Constraints&lt;/H5&gt;
&lt;P&gt;When designing the VNet:&lt;/P&gt;
&lt;UL data-spread="true"&gt;
&lt;LI&gt;10.x.x.x range&lt;BR /&gt;→ Not GA for all Azure regions by default&lt;/LI&gt;
&lt;LI&gt;Requires:&lt;BR /&gt;→ &lt;STRONG&gt;Allowlisting via Microsoft Product Group&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Supported ranges (commonly observed):
&lt;UL data-spread="false"&gt;
&lt;LI&gt;172.x.x.x&lt;/LI&gt;
&lt;LI&gt;192.x.x.x&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Recommended Approach&lt;/H3&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Validate supported IP ranges &lt;STRONG&gt;before finalizing network design&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Avoid assuming default enterprise CIDR blocks will work&lt;/LI&gt;
&lt;LI&gt;Plan subnets specifically for &lt;STRONG&gt;AI workloads&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;Challenge 3: Platform Constraints&lt;/H5&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;AI services may behave differently compared to traditional PaaS services&lt;/LI&gt;
&lt;LI&gt;Validate:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Private endpoint compatibility&lt;/LI&gt;
&lt;LI&gt;Service integration within VNet&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;Challenge 4: Security vs Accessibility&lt;/H5&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Private deployments improve security but add complexity&lt;/LI&gt;
&lt;LI&gt;Address this by:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Providing controlled access paths&lt;/LI&gt;
&lt;LI&gt;Using jump hosts or secure access mechanisms&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Key Design Considerations&lt;/H2&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Treat networking as a &lt;STRONG&gt;core dependency for AI platforms&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Enforce Responsible AI across:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Model layer&lt;/LI&gt;
&lt;LI&gt;Platform layer&lt;/LI&gt;
&lt;LI&gt;Runtime layer&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Use &lt;STRONG&gt;layered security architecture&lt;/STRONG&gt;:
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Network isolation&lt;/LI&gt;
&lt;LI&gt;Policy enforcement&lt;/LI&gt;
&lt;LI&gt;Content filtering&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Validate constraints early to avoid redesign&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Best Practices&lt;/H2&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Plan IP addressing specifically for &lt;STRONG&gt;AI workloads&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Use &lt;STRONG&gt;private endpoints and VNet integration by default&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Centralize model access through &lt;STRONG&gt;application layers&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Apply Responsible AI controls as &lt;STRONG&gt;mandatory, not optional&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Design AI platforms with &lt;STRONG&gt;governance built-in from the start&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Conclusion&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-teams="true"&gt;Operationalizing Responsible AI in Microsoft Azure goes beyond defining policies—it demands tight alignment across AI services, infrastructure, networking, and governance controls. A well-architected AI landing zone provides the foundation for securely deploying models, enforcing content filtering on outputs, and ensuring that access remains strictly within enterprise-defined boundaries.&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;AI services&lt;/LI&gt;
&lt;LI&gt;Infrastructure&lt;/LI&gt;
&lt;LI&gt;Networking&lt;/LI&gt;
&lt;LI&gt;Governance controls&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;A well-designed AI landing zone ensures that:&lt;/P&gt;
&lt;UL data-spread="false"&gt;
&lt;LI&gt;Models are deployed securely&lt;/LI&gt;
&lt;LI&gt;Outputs are governed and filtered&lt;/LI&gt;
&lt;LI&gt;Access is controlled within enterprise boundaries&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Responsible AI is not just a policy—it is an &lt;STRONG&gt;architectural outcome driven by platform design, network constraints, and enforcement mechanisms&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-teams="true"&gt;This holistic approach transforms Responsible AI from a conceptual guideline into a practical, enforceable outcome of system design. Crucially, early architectural decisions—particularly those related to networking, private access, and service compatibility—have a lasting impact on how effectively Responsible AI can be scaled across the organization.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 06 May 2026 05:23:44 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/operationalizing-responsible-ai-in-microsoft-foundry-within/ba-p/4516869</guid>
      <dc:creator>Shruti9162</dc:creator>
      <dc:date>2026-05-06T05:23:44Z</dc:date>
    </item>
    <item>
      <title>Deploying Azure Resources with Managed HSM Keys Using Bicep</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/deploying-azure-resources-with-managed-hsm-keys-using-bicep/ba-p/4516971</link>
      <description>&lt;H2 data-section-id="14kwlto" data-start="766" data-end="793"&gt;Architecture Overview&lt;/H2&gt;
&lt;P data-start="795" data-end="819"&gt;The deployment includes:&lt;/P&gt;
&lt;UL data-start="821" data-end="1029"&gt;
&lt;LI data-section-id="12vqy4y" data-start="821" data-end="845"&gt;Managed HSM instance&lt;/LI&gt;
&lt;LI data-section-id="16xrtlo" data-start="846" data-end="873"&gt;Key creation inside HSM&lt;/LI&gt;
&lt;LI data-section-id="8xx89z" data-start="874" data-end="928"&gt;User-assigned managed identity / service principal&lt;/LI&gt;
&lt;LI data-section-id="19unc8k" data-start="929" data-end="964"&gt;Role assignments for key access&lt;/LI&gt;
&lt;LI data-section-id="1ulu1bd" data-start="965" data-end="1029"&gt;Azure resource (e.g., Storage / Databricks / Disk) using CMK&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1031" data-end="1040"&gt;&lt;STRONG data-start="1031" data-end="1040"&gt;Flow:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL data-start="1041" data-end="1155"&gt;
&lt;LI data-section-id="1xiu8q5" data-start="1041" data-end="1064"&gt;Create Managed HSM&lt;/LI&gt;
&lt;LI data-section-id="bt4lol" data-start="1065" data-end="1091"&gt;Create encryption key&lt;/LI&gt;
&lt;LI data-section-id="q080sh" data-start="1092" data-end="1115"&gt;Assign permissions&lt;/LI&gt;
&lt;LI data-section-id="7f9jyq" data-start="1116" data-end="1155"&gt;Deploy resource with CMK reference&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2 data-section-id="16qpaeu" data-start="1162" data-end="1181"&gt;Prerequisites&lt;/H2&gt;
&lt;P data-start="1183" data-end="1207"&gt;Before starting, ensure:&lt;/P&gt;
&lt;UL data-start="1208" data-end="1352"&gt;
&lt;LI data-section-id="1qk79he" data-start="1208" data-end="1254"&gt;Azure subscription with proper permissions&lt;/LI&gt;
&lt;LI data-section-id="6c9rgg" data-start="1255" data-end="1287"&gt;Access to create Managed HSM&lt;/LI&gt;
&lt;LI data-section-id="1i2075y" data-start="1288" data-end="1328"&gt;Knowledge of RBAC vs access policies&lt;/LI&gt;
&lt;LI data-section-id="lccr77" data-start="1329" data-end="1352"&gt;Bicep CLI installed&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-section-id="18wqp5s" data-start="1359" data-end="1392"&gt;Step 1: Deploy Managed HSM&lt;/H2&gt;
&lt;P data-start="1394" data-end="1442"&gt;Managed HSM is different from regular Key Vault:&lt;/P&gt;
&lt;UL data-start="1443" data-end="1530"&gt;
&lt;LI data-section-id="zmvddo" data-start="1443" data-end="1484"&gt;Uses &lt;STRONG data-start="1450" data-end="1463"&gt;RBAC only&lt;/STRONG&gt; (no access policies)&lt;/LI&gt;
&lt;LI data-section-id="lmut00" data-start="1485" data-end="1530"&gt;Requires &lt;STRONG data-start="1496" data-end="1530"&gt;security domain initialization&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1532" data-end="1550"&gt;&lt;STRONG data-start="1532" data-end="1550"&gt;Bicep snippet:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;resource managedHsm 'Microsoft.KeyVault/managedHSMs@2023-02-01' = {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;name: hsmName&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;location: location&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;sku: {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;name: 'Standard_B1'&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;family: 'B'&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;properties: {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;tenantId: tenant().tenantId&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;initialAdminObjectIds: [&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;adminObjectId&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;]&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2 data-section-id="18wqp5s" data-start="1359" data-end="1392"&gt;Step 2: Create Key in Managed HSM&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;resource key 'Microsoft.KeyVault/managedHSMs/keys@2023-02-01' = {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;name: '${managedHsm.name}/cmk-key'&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;properties: {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;kty: 'RSA-HSM'&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;keySize: 2048&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2 data-section-id="daskzu" data-start="2059" data-end="2091"&gt;Step 3: Assign Permissions&lt;/H2&gt;
&lt;P data-start="2093" data-end="2140"&gt;Since Managed HSM uses RBAC, assign roles like:&lt;/P&gt;
&lt;UL data-start="2142" data-end="2204"&gt;
&lt;LI data-section-id="ttxtbd" data-start="2142" data-end="2171"&gt;&lt;STRONG data-start="2144" data-end="2171"&gt;Managed HSM Crypto User&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="h48emi" data-start="2172" data-end="2204"&gt;&lt;STRONG data-start="2174" data-end="2204"&gt;Managed HSM Crypto Officer&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;name: guid(resourceGroup().id, principalId, roleDefinitionId)&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;properties: {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;principalId: principalId&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;roleDefinitionId: roleDefinitionId&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;scope: managedHsm&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2 data-section-id="1da8j2j" data-start="2482" data-end="2523"&gt;Step 4: Configure Resource with CMK&lt;/H2&gt;
&lt;P data-start="2525" data-end="2560"&gt;Example: Storage Account encryption&lt;/P&gt;
&lt;P data-start="2525" data-end="2560"&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;resource storage 'Microsoft.Storage/storageAccounts@2023-01-01' = {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;name: storageName&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;location: location&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;kind: 'StorageV2'&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;sku: {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;name: 'Standard_LRS'&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;properties: {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;encryption: {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;keySource: 'Microsoft.Keyvault'&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;keyvaultproperties: {&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;keyname: key.name&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;keyvaulturi: managedHsm.properties.hsmUri&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;}&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2 data-section-id="8bi4hj" data-start="2944" data-end="2967"&gt;Common Challenges&lt;/H2&gt;
&lt;H3 data-section-id="161slp4" data-start="2969" data-end="2993"&gt;1. Permission Issues&lt;/H3&gt;
&lt;UL data-start="2994" data-end="3081"&gt;
&lt;LI data-section-id="ijcy0f" data-start="2994" data-end="3043"&gt;Resource identity must have access to HSM key&lt;/LI&gt;
&lt;LI data-section-id="jwzwfn" data-start="3044" data-end="3081"&gt;Missing role → deployment failure&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-section-id="11r9wu3" data-start="3083" data-end="3109"&gt;2. Key Rotation Impact&lt;/H3&gt;
&lt;P data-start="3110" data-end="3132"&gt;When keys are rotated:&lt;/P&gt;
&lt;UL data-start="3133" data-end="3242"&gt;
&lt;LI data-section-id="1j42t7b" data-start="3133" data-end="3189"&gt;Resource may &lt;STRONG data-start="3148" data-end="3189"&gt;not automatically pick latest version&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="11i2ju7" data-start="3190" data-end="3242"&gt;You may need to redeploy or update configuration&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-section-id="1g57nqh" data-start="3244" data-end="3268"&gt;3. Deployment Errors&lt;/H3&gt;
&lt;P data-start="3269" data-end="3283"&gt;Typical issue:&lt;/P&gt;
&lt;P data-start="3286" data-end="3328"&gt;Storage/Databricks cannot access HSM key&lt;/P&gt;
&lt;P data-start="3330" data-end="3334"&gt;Fix:&lt;/P&gt;
&lt;UL data-start="3335" data-end="3427"&gt;
&lt;LI data-section-id="1og5zq1" data-start="3335" data-end="3376"&gt;Ensure correct &lt;STRONG data-start="3352" data-end="3376"&gt;RBAC role assignment&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="lhn1m4" data-start="3377" data-end="3427"&gt;Validate &lt;STRONG data-start="3388" data-end="3427"&gt;principal ID used during deployment&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2 data-section-id="j8by84" data-start="3434" data-end="3461"&gt;Key Rotation Strategy&lt;/H2&gt;
&lt;P data-start="3463" data-end="3484"&gt;Managed HSM supports:&lt;/P&gt;
&lt;UL data-start="3485" data-end="3526"&gt;
&lt;LI data-section-id="xlgvnq" data-start="3485" data-end="3504"&gt;Manual rotation&lt;/LI&gt;
&lt;LI data-section-id="jetw2i" data-start="3505" data-end="3526"&gt;Rotation policies&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="3528" data-end="3542"&gt;Best practice:&lt;/P&gt;
&lt;UL data-start="3543" data-end="3619"&gt;
&lt;LI data-section-id="yh468d" data-start="3543" data-end="3584"&gt;Use version-less key URI if supported&lt;/LI&gt;
&lt;LI data-section-id="2lnggm" data-start="3585" data-end="3619"&gt;Automate redeployment pipeline&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2 data-section-id="1im216t" data-start="3898" data-end="3940"&gt;When to Use Managed HSM vs Key Vault&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Feature&lt;/th&gt;&lt;th&gt;Managed HSM&lt;/th&gt;&lt;th&gt;Key Vault&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;FIPS Level&lt;/td&gt;&lt;td&gt;Level 3&lt;/td&gt;&lt;td&gt;Level 2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Multi-tenant isolation&lt;/td&gt;&lt;td&gt;No (dedicated)&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RBAC only&lt;/td&gt;&lt;td&gt;Yes&lt;/td&gt;&lt;td&gt;Optional&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cost&lt;/td&gt;&lt;td&gt;Higher&lt;/td&gt;&lt;td&gt;Lower&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2 data-section-id="20nd6j" data-start="4163" data-end="4179"&gt;Conclusion&lt;/H2&gt;
&lt;P data-start="4181" data-end="4218"&gt;Using Managed HSM with Bicep enables:&lt;/P&gt;
&lt;UL data-start="4219" data-end="4345"&gt;
&lt;LI data-section-id="11thqyt" data-start="4219" data-end="4266"&gt;Stronger security with hardware-backed keys&lt;/LI&gt;
&lt;LI data-section-id="1w99h6q" data-start="4267" data-end="4313"&gt;Full automation via Infrastructure as Code&lt;/LI&gt;
&lt;LI data-section-id="1dqn9x4" data-start="4314" data-end="4345"&gt;Enterprise-grade compliance&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4347" data-end="4388"&gt;However, it requires careful handling of:&lt;/P&gt;
&lt;UL data-start="4389" data-end="4451"&gt;
&lt;LI data-section-id="16zsxky" data-start="4389" data-end="4409"&gt;RBAC permissions&lt;/LI&gt;
&lt;LI data-section-id="1yu906j" data-start="4410" data-end="4426"&gt;Key rotation&lt;/LI&gt;
&lt;LI data-section-id="vs4gxo" data-start="4427" data-end="4451"&gt;Resource integration&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 05 May 2026 09:30:46 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/deploying-azure-resources-with-managed-hsm-keys-using-bicep/ba-p/4516971</guid>
      <dc:creator>Roslin_Nivetha</dc:creator>
      <dc:date>2026-05-05T09:30:46Z</dc:date>
    </item>
    <item>
      <title>Building an AI Agent for Azure Infrastructure Validation</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-an-ai-agent-for-azure-infrastructure-validation/ba-p/4516936</link>
      <description>&lt;img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;1. Introduction&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Infrastructure consistency is critical in large-scale Azure environments, especially in migration programs and DevOps-driven deployments. While Infrastructure as Code (IaC) using Terraform improves reproducibility, it does not fully eliminate:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Manual errors in design specifications&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Drift between Terraform and deployed resources&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Misalignment between approved design (Excel/architecture docs) and deployed state&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;To address this, we propose building an AI-powered Infrastructure Validation Agent that continuously validates and reconciles:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Excel (Source of Truth)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Terraform (.tf files)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Deployed Resources&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;This blog explains the architecture, implementation, validation logic, and real-world applicability of such an agent.&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;H2&gt;2. Problem Statement&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;In enterprise environments, infrastructure data flows through multiple stages:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;
&lt;TABLE style="" border="1"&gt;
&lt;TBODY&gt;
&lt;TR style=""&gt;
&lt;TH style=""&gt;&lt;STRONG&gt;Source&lt;/STRONG&gt;&lt;/TH&gt;
&lt;TH style=""&gt;&lt;STRONG&gt;Purpose&lt;/STRONG&gt;&lt;/TH&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Excel / Design Sheets&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Approved architecture specifications&lt;/STRONG&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Terraform&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Infrastructure as Code implementation&lt;/STRONG&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Azure Portal&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Actual deployed infrastructure&lt;/STRONG&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;COLGROUP&gt;&lt;COL style="width: 50.00%" /&gt;&lt;COL style="width: 50.00%" /&gt;&lt;/COLGROUP&gt;&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;H3&gt;3.Common Challenges&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Configuration mismatches across stages&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Drift due to manual portal changes&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Incorrect SKU, region, or configuration deployment&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Lack of automated validation before and after deployment&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;The absence of unified validation leads to compliance risks, deployment errors, and operational inefficiencies.&lt;/STRONG&gt;&lt;/P&gt;
&lt;H2&gt;4. Solution Overview&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;The proposed solution is an AI-powered validation agent that:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Ingests Excel as configuration input&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Parses Terraform configurations&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Fetches deployed resource details from Azure&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;5. Architecture Overview&lt;/H2&gt;
&lt;H3&gt;High-Level Architecture Components&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Input Layer&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Excel file (configuration source)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Processing Layer&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Terraform Parser&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Resource Fetcher&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;AI-based Validator (optional reasoning layer)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Comparison Engine&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Schema-based comparison&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Drift detection logic&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Output Layer&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Validation report (JSON / Excel / HTML)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Hosting&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Function App&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Optional Enhancements&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure AI Search for semantic matching and reasoning&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2&gt;6. Agent Design (Modular Components)&lt;/H2&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;
&lt;TABLE style="" border="1"&gt;
&lt;TBODY&gt;
&lt;TR style=""&gt;
&lt;TH style=""&gt;&lt;STRONG&gt;Module&lt;/STRONG&gt;&lt;/TH&gt;
&lt;TH style=""&gt;&lt;STRONG&gt;Description&lt;/STRONG&gt;&lt;/TH&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Excel Reader&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Reads and standardizes input&lt;/STRONG&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Terraform Parser&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Extracts resource configuration&lt;/STRONG&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Azure Fetcher&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Queries deployed resources&lt;/STRONG&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Comparator Engine&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Identifies mismatches&lt;/STRONG&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;AI Validator&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Enhances validation and recommendations&lt;/STRONG&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Report Generator&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD style=""&gt;&lt;STRONG&gt;Produces actionable outputs&lt;/STRONG&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;COLGROUP&gt;&lt;COL style="width: 50.00%" /&gt;&lt;COL style="width: 50.00%" /&gt;&lt;/COLGROUP&gt;&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;
&lt;P&gt;`&lt;/P&gt;
&lt;/DIV&gt;
&lt;H2&gt;7. Agent Design&lt;BR /&gt;Step 1: Read Excel Input&lt;/H2&gt;
&lt;P&gt;import pandas as pd&lt;/P&gt;
&lt;P&gt;ef read_excel(file_path):&lt;/P&gt;
&lt;P&gt;df = pd.read_excel(file_path)&lt;/P&gt;
&lt;P&gt;df.columns = df.columns.str.strip()&lt;/P&gt;
&lt;P&gt;return df&lt;/P&gt;
&lt;P&gt;excel_df = read_excel("infra_config.xlsx")&lt;/P&gt;
&lt;P&gt;print(excel_df.head())&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;H2&gt;Step 2:Parse Terraform Files&lt;/H2&gt;
&lt;P&gt;import hcl2&lt;/P&gt;
&lt;P&gt;def parse_terraform(file_path):&lt;/P&gt;
&lt;P&gt;with open(file_path, 'r') as file:&lt;/P&gt;
&lt;P&gt;data = hcl2.load(file)&lt;/P&gt;
&lt;P&gt;resources = []&lt;/P&gt;
&lt;P&gt;for resource_type in data.get('resource', []):&lt;/P&gt;
&lt;P&gt;for rtype, instances in resource_type.items():&lt;/P&gt;
&lt;P&gt;for name, config in instances.items():&lt;/P&gt;
&lt;P&gt;resource = {&lt;/P&gt;
&lt;P&gt;"resource_type": rtype,&lt;/P&gt;
&lt;P&gt;"resource_name": name,&lt;/P&gt;
&lt;P&gt;"config": config&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;P&gt;resources.append(resource)&lt;/P&gt;
&lt;P&gt;return resources&lt;/P&gt;
&lt;P&gt;tf_resources = parse_terraform("main.tf")&lt;/P&gt;
&lt;P&gt;print(tf_resources)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Step 3:Parse Terraform Files&lt;/H2&gt;
&lt;P&gt;from azure.identity import DefaultAzureCredential&lt;/P&gt;
&lt;P&gt;from azure.mgmt.resource import ResourceManagementClient&lt;/P&gt;
&lt;P&gt;credential = DefaultAzureCredential()&lt;/P&gt;
&lt;P&gt;subscription_id = "your-subscription-id"&lt;/P&gt;
&lt;P&gt;resource_client = ResourceManagementClient(credential, subscription_id)&lt;/P&gt;
&lt;P&gt;def fetch_azure_resources():&lt;/P&gt;
&lt;P&gt;resources = []&lt;/P&gt;
&lt;P&gt;for resource in resource_client.resources.list():&lt;/P&gt;
&lt;P&gt;res = {&lt;/P&gt;
&lt;P&gt;"name": resource.name,&lt;/P&gt;
&lt;P&gt;"type": resource.type,&lt;/P&gt;
&lt;P&gt;"location": resource.location,&lt;/P&gt;
&lt;P&gt;"id": resource.id&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;P&gt;resources.append(res)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;return resources&lt;/P&gt;
&lt;P&gt;azure_resources = fetch_azure_resources()&lt;/P&gt;
&lt;P&gt;print(azure_resources)&lt;/P&gt;
&lt;H2&gt;Step 4:Normalize Data&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;def normalize_excel(df):&lt;/P&gt;
&lt;P&gt;return df.to_dict(orient='records')&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;def normalize_tf(tf_resources):&lt;/P&gt;
&lt;P&gt;normalized = []&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;for res in tf_resources:&lt;/P&gt;
&lt;P&gt;normalized.append({&lt;/P&gt;
&lt;P&gt;"resource_name": res["resource_name"],&lt;/P&gt;
&lt;P&gt;"resource_type": res["resource_type"],&lt;/P&gt;
&lt;P&gt;"config": res["config"]&lt;/P&gt;
&lt;P&gt;})&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;return normalized&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;def normalize_azure(azure_resources):&lt;/P&gt;
&lt;P&gt;normalized = []&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;for res in azure_resources:&lt;/P&gt;
&lt;P&gt;normalized.append({&lt;/P&gt;
&lt;P&gt;"resource_name": res["name"],&lt;/P&gt;
&lt;P&gt;"resource_type": res["type"],&lt;/P&gt;
&lt;P&gt;"location": res["location"]&lt;/P&gt;
&lt;P&gt;})&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;return normalized&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Step 5: Validation Logic (Drift Detection)&lt;/H2&gt;
&lt;P&gt;def compare_resources(excel_data, tf_data, azure_data):&lt;/P&gt;
&lt;P&gt;issues = []&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;for excel_res in excel_data:&lt;/P&gt;
&lt;P&gt;name = excel_res['resource_name']&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;tf_match = next((r for r in tf_data if r['resource_name'] == name), None)&lt;/P&gt;
&lt;P&gt;az_match = next((r for r in azure_data if r['resource_name'] == name), None)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if not tf_match:&lt;/P&gt;
&lt;P&gt;issues.append({&lt;/P&gt;
&lt;P&gt;"resource": name,&lt;/P&gt;
&lt;P&gt;"issue": "Missing in Terraform",&lt;/P&gt;
&lt;P&gt;"severity": "High"&lt;/P&gt;
&lt;P&gt;})&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if not az_match:&lt;/P&gt;
&lt;P&gt;issues.append({&lt;/P&gt;
&lt;P&gt;"resource": name,&lt;/P&gt;
&lt;P&gt;"issue": "Missing in Azure",&lt;/P&gt;
&lt;P&gt;"severity": "Critical"&lt;/P&gt;
&lt;P&gt;})&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if tf_match and az_match:&lt;/P&gt;
&lt;P&gt;if excel_res['region'] != az_match.get('location'):&lt;/P&gt;
&lt;P&gt;issues.append({&lt;/P&gt;
&lt;P&gt;"resource": name,&lt;/P&gt;
&lt;P&gt;"issue": "Region mismatch",&lt;/P&gt;
&lt;P&gt;"expected": excel_res['region'],&lt;/P&gt;
&lt;P&gt;"actual": az_match.get('location')&lt;/P&gt;
&lt;P&gt;})&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;return issues&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;drift_report = compare_resources(&lt;/P&gt;
&lt;P&gt;normalize_excel(excel_df),&lt;/P&gt;
&lt;P&gt;normalize_tf(tf_resources),&lt;/P&gt;
&lt;P&gt;normalize_azure(azure_resources)&lt;/P&gt;
&lt;P&gt;)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;print(drift_report)&lt;/P&gt;
&lt;H2&gt;Step 6: Export Report to Excel&lt;/H2&gt;
&lt;P&gt;Sample validation&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;
&lt;TABLE style="" border="1"&gt;
&lt;THEAD&gt;
&lt;TR style=""&gt;
&lt;TH style=""&gt;Resource&lt;/TH&gt;
&lt;TH style=""&gt;Issue&lt;/TH&gt;
&lt;TH style=""&gt;Expected&lt;/TH&gt;
&lt;TH style=""&gt;Actual&lt;/TH&gt;
&lt;TH style=""&gt;Severity&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;func-app-01&lt;/TD&gt;
&lt;TD style=""&gt;Missing in Terraform&lt;/TD&gt;
&lt;TD style=""&gt;-&lt;/TD&gt;
&lt;TD style=""&gt;-&lt;/TD&gt;
&lt;TD style=""&gt;High&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;search-01&lt;/TD&gt;
&lt;TD style=""&gt;SKU mismatch&lt;/TD&gt;
&lt;TD style=""&gt;Standard&lt;/TD&gt;
&lt;TD style=""&gt;Basic&lt;/TD&gt;
&lt;TD style=""&gt;Medium&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style=""&gt;
&lt;TD style=""&gt;webapp-01&lt;/TD&gt;
&lt;TD style=""&gt;Region mismatch&lt;/TD&gt;
&lt;TD style=""&gt;East US&lt;/TD&gt;
&lt;TD style=""&gt;West Europe&lt;/TD&gt;
&lt;TD style=""&gt;High&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;COLGROUP&gt;&lt;COL style="width: 20.00%" /&gt;&lt;COL style="width: 20.00%" /&gt;&lt;COL style="width: 20.00%" /&gt;&lt;COL style="width: 20.00%" /&gt;&lt;COL style="width: 20.00%" /&gt;&lt;/COLGROUP&gt;&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;BR /&gt;&lt;BR /&gt;&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2026 06:05:23 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-an-ai-agent-for-azure-infrastructure-validation/ba-p/4516936</guid>
      <dc:creator>ranjsharma</dc:creator>
      <dc:date>2026-05-05T06:05:23Z</dc:date>
    </item>
    <item>
      <title>Building Multi-File Refactoring Agents with GitHub Copilot Workspace</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-multi-file-refactoring-agents-with-github-copilot/ba-p/4516932</link>
      <description>&lt;H2 data-section-id="13ax1s5" data-start="266" data-end="281"&gt;Introduction&lt;/H2&gt;
&lt;P data-start="283" data-end="626"&gt;AI-assisted development continues to evolve beyond inline code suggestions toward &lt;STRONG data-start="365" data-end="401"&gt;end-to-end engineering workflows&lt;/STRONG&gt;. While tools such as GitHub Copilot have significantly improved developer productivity at the function and file level, modern applications demand capabilities that operate across the entire repository.&lt;/P&gt;
&lt;P data-start="628" data-end="790"&gt;Refactoring, modernization, and architectural changes rarely occur in isolation. They require &lt;STRONG data-start="722" data-end="789"&gt;coordinated updates across multiple files, services, and layers&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-start="792" data-end="946"&gt;With GitHub Copilot Workspace, developers can now move from incremental edits to &lt;STRONG data-start="886" data-end="931"&gt;intent-driven, multi-file transformations&lt;/STRONG&gt; powered by AI.&lt;/P&gt;
&lt;P data-start="948" data-end="975"&gt;This article walks through:&lt;/P&gt;
&lt;UL data-start="976" data-end="1201"&gt;
&lt;LI data-section-id="1g6i0g0" data-start="976" data-end="1041"&gt;The role of Copilot Workspace in modern development workflows&lt;/LI&gt;
&lt;LI data-section-id="tti5ql" data-start="1042" data-end="1092"&gt;How to access and use the Workspace experience&lt;/LI&gt;
&lt;LI data-section-id="7layzd" data-start="1093" data-end="1141"&gt;A practical, end-to-end refactoring scenario&lt;/LI&gt;
&lt;LI data-section-id="tafst" data-start="1142" data-end="1201"&gt;Key benefits and considerations for enterprise adoption&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-section-id="12pib85" data-start="1208" data-end="1253"&gt;From Code Assistance to Code Orchestration&lt;/H2&gt;
&lt;P data-start="1255" data-end="1418"&gt;Traditional AI-assisted development focuses on generating code snippets in response to local context. While effective, this approach is limited when tasks require:&lt;/P&gt;
&lt;UL data-start="1420" data-end="1502"&gt;
&lt;LI data-section-id="ico30z" data-start="1420" data-end="1446"&gt;Cross-file consistency&lt;/LI&gt;
&lt;LI data-section-id="1oocydm" data-start="1447" data-end="1474"&gt;Architectural alignment&lt;/LI&gt;
&lt;LI data-section-id="199ms3e" data-start="1475" data-end="1502"&gt;Large-scale refactoring&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="1504" data-end="1551"&gt;Copilot Workspace introduces a different model:&lt;/P&gt;
&lt;P data-start="1555" data-end="1631"&gt;Developers define &lt;EM data-start="1573" data-end="1581"&gt;intent&lt;/EM&gt;, and AI orchestrates &lt;EM data-start="1603" data-end="1630"&gt;repository-wide execution&lt;/EM&gt;.&lt;/P&gt;
&lt;P data-start="1633" data-end="1661"&gt;This enables a shift toward:&lt;/P&gt;
&lt;UL data-start="1662" data-end="1767"&gt;
&lt;LI data-section-id="dwf1qp" data-start="1662" data-end="1691"&gt;Task-oriented development&lt;/LI&gt;
&lt;LI data-section-id="10enixb" data-start="1692" data-end="1732"&gt;Structured planning before execution&lt;/LI&gt;
&lt;LI data-section-id="vdkmoe" data-start="1733" data-end="1767"&gt;Coordinated multi-file updates&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-section-id="9fc943" data-start="1774" data-end="1815"&gt;Getting Started with Copilot Workspace&lt;/H2&gt;
&lt;P data-start="1817" data-end="1973"&gt;Access to GitHub Copilot Workspace depends on feature availability and organizational enablement. The following entry points are commonly used:&lt;/P&gt;
&lt;H3 data-section-id="r1un9z" data-start="1975" data-end="2003"&gt;Access from a Repository&lt;/H3&gt;
&lt;UL data-start="2005" data-end="2133"&gt;
&lt;LI data-section-id="1wi2ga5" data-start="2005" data-end="2074"&gt;Navigate to a repository in GitHub&lt;/LI&gt;
&lt;LI data-section-id="1enzcsz" data-start="2075" data-end="2133"&gt;Select the &lt;STRONG data-start="2088" data-end="2099"&gt;Copilot&lt;/STRONG&gt; option or &lt;STRONG data-start="2110" data-end="2131"&gt;Open in Workspace&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-section-id="ow11gq" data-start="2330" data-end="2357"&gt;Direct Workspace Access&lt;/H3&gt;
&lt;P data-start="2359" data-end="2384"&gt;You can also navigate to:&lt;/P&gt;
&lt;P&gt;https://github.com/copilot/workspace&lt;/P&gt;
&lt;P data-start="2431" data-end="2478"&gt;If enabled, this opens the Workspace interface.&lt;/P&gt;
&lt;H2 data-section-id="1hin9pf" data-start="2485" data-end="2526"&gt;Understanding the Workspace Experience&lt;/H2&gt;
&lt;P data-start="2528" data-end="2606"&gt;Copilot Workspace provides a structured interface designed for task execution:&lt;/P&gt;
&lt;UL data-start="2608" data-end="2830"&gt;
&lt;LI data-section-id="1ci5sm7" data-start="2608" data-end="2657"&gt;&lt;STRONG data-start="2610" data-end="2626"&gt;Intent Panel&lt;/STRONG&gt; – Define the desired outcome&lt;/LI&gt;
&lt;LI data-section-id="5nkqje" data-start="2658" data-end="2707"&gt;&lt;STRONG data-start="2660" data-end="2677"&gt;Planning View&lt;/STRONG&gt; – Review AI-generated steps&lt;/LI&gt;
&lt;LI data-section-id="yw1p0j" data-start="2708" data-end="2762"&gt;&lt;STRONG data-start="2710" data-end="2731"&gt;Multi-file Editor&lt;/STRONG&gt; – Inspect and refine changes&lt;/LI&gt;
&lt;LI data-section-id="y2ujnn" data-start="2763" data-end="2830"&gt;&lt;STRONG data-start="2765" data-end="2787"&gt;Execution Controls&lt;/STRONG&gt; – Apply updates and create pull requests&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="2832" data-end="2926"&gt;This workflow emphasizes &lt;STRONG data-start="2857" data-end="2885"&gt;transparency and control&lt;/STRONG&gt;, ensuring developers remain in the loop.&lt;/P&gt;
&lt;H2 data-section-id="17xbmye" data-start="2933" data-end="2948"&gt;Key Benefits&lt;/H2&gt;
&lt;H3 data-section-id="1dmkajg" data-start="2950" data-end="2985"&gt;Repository-Aware Intelligence&lt;/H3&gt;
&lt;P data-start="2986" data-end="3095"&gt;Copilot Workspace analyzes relationships across files, enabling more accurate and consistent transformations.&lt;/P&gt;
&lt;H3 data-section-id="snf45" data-start="3102" data-end="3131"&gt;Intent-Driven Workflows&lt;/H3&gt;
&lt;P data-start="3132" data-end="3225"&gt;Developers focus on &lt;EM data-start="3152" data-end="3175"&gt;what needs to be done&lt;/EM&gt;, while the system determines &lt;EM data-start="3205" data-end="3224"&gt;how to execute it&lt;/EM&gt;.&lt;/P&gt;
&lt;H3 data-section-id="1ysdw4m" data-start="3232" data-end="3267"&gt;Consistent Multi-File Updates&lt;/H3&gt;
&lt;P data-start="3268" data-end="3354"&gt;Changes are applied uniformly across controllers, services, and supporting components.&lt;/P&gt;
&lt;H3 data-section-id="1ac01hp" data-start="3361" data-end="3408"&gt;Accelerated Refactoring and Modernization&lt;/H3&gt;
&lt;P data-start="3409" data-end="3490"&gt;Large-scale changes can be executed efficiently, reducing manual effort and risk.&lt;/P&gt;
&lt;H2 data-section-id="qxsl9q" data-start="3497" data-end="3546"&gt;Practical Scenario: Modernizing Authentication&lt;/H2&gt;
&lt;P data-start="3548" data-end="3639"&gt;To illustrate the capabilities of Copilot Workspace, consider a common enterprise scenario:&lt;/P&gt;
&lt;P data-start="3641" data-end="3835"&gt;An application currently uses &lt;STRONG data-start="3671" data-end="3704"&gt;password-based authentication&lt;/STRONG&gt;, implemented across multiple layers. The goal is to migrate to a &lt;STRONG data-start="3770" data-end="3806"&gt;token-based authentication model&lt;/STRONG&gt; using a centralized service.&lt;/P&gt;
&lt;H2 data-section-id="in2amk" data-start="3842" data-end="3858"&gt;Initial State&lt;/H2&gt;
&lt;P data-start="3860" data-end="3919"&gt;Authentication logic is distributed across the application:&lt;/P&gt;
&lt;LI-CODE lang="csharp"&gt;ValidateUser(username, password)&lt;/LI-CODE&gt;
&lt;P&gt;ValidateUser(username, password)&lt;/P&gt;
&lt;P data-start="3969" data-end="3993"&gt;This pattern appears in:&lt;/P&gt;
&lt;UL data-start="3994" data-end="4037"&gt;
&lt;LI data-section-id="60xd7r" data-start="3994" data-end="4009"&gt;Controllers&lt;/LI&gt;
&lt;LI data-section-id="1cyzp5y" data-start="4010" data-end="4022"&gt;Services&lt;/LI&gt;
&lt;LI data-section-id="eiujd0" data-start="4023" data-end="4037"&gt;Middleware&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-section-id="6sdecw" data-start="4044" data-end="4066"&gt;Defining the Intent&lt;/H2&gt;
&lt;P data-start="4068" data-end="4162"&gt;Within GitHub Copilot Workspace, the developer provides a structured instruction:&lt;/P&gt;
&lt;P data-start="4166" data-end="4360"&gt;Replace all password-based authentication with token-based authentication using AuthService. Update all references, introduce dependency injection, and ensure consistency across the application.&lt;/P&gt;
&lt;H2 data-section-id="scpyak" data-start="4367" data-end="4387"&gt;AI-Generated Plan&lt;/H2&gt;
&lt;P data-start="4389" data-end="4465"&gt;Copilot Workspace analyzes the repository and produces a plan that includes:&lt;/P&gt;
&lt;UL data-start="4467" data-end="4691"&gt;
&lt;LI data-section-id="n8q085" data-start="4467" data-end="4511"&gt;Identifying all usages of ValidateUser&lt;/LI&gt;
&lt;LI data-section-id="1gcti0p" data-start="4512" data-end="4564"&gt;Introducing a centralized authentication service&lt;/LI&gt;
&lt;LI data-section-id="bjw9oq" data-start="4565" data-end="4606"&gt;Updating controllers to return tokens&lt;/LI&gt;
&lt;LI data-section-id="l424ad" data-start="4607" data-end="4654"&gt;Refactoring middleware for token validation&lt;/LI&gt;
&lt;LI data-section-id="1qp0s59" data-start="4655" data-end="4691"&gt;Configuring dependency injection&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="4693" data-end="4766"&gt;This plan provides a &lt;STRONG data-start="4714" data-end="4765"&gt;transparent view of the proposed transformation&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-section-id="xvky6h" data-start="4773" data-end="4792"&gt;Refactored State&lt;/H2&gt;
&lt;H3 data-section-id="y2g7m2" data-start="4794" data-end="4832"&gt;Centralized Authentication Service&lt;/H3&gt;
&lt;LI-CODE lang="csharp"&gt;public interface IAuthService
{
    string GenerateToken(string username);
    bool ValidateToken(string token);
}&lt;/LI-CODE&gt;&lt;LI-CODE lang="csharp"&gt;public class AuthService : IAuthService
{
    public string GenerateToken(string username)
    {
        return Convert.ToBase64String(Encoding.UTF8.GetBytes(username));
    }

    public bool ValidateToken(string token)
    {
        return !string.IsNullOrEmpty(token);
    }
}&lt;/LI-CODE&gt;
&lt;H3 data-section-id="1lduxpr" data-start="5264" data-end="5286"&gt;Updated Controller&lt;/H3&gt;
&lt;LI-CODE lang="csharp"&gt;public class UserController : Controller
{
private readonly IAuthService _authService;

public UserController(IAuthService authService)
{
_authService = authService;
}

public IActionResult Login(string username, string password)
{
var token = _authService.GenerateToken(username);
return Ok(new { Token = token });
}&lt;/LI-CODE&gt;
&lt;H3 data-section-id="pewolb" data-start="5680" data-end="5702"&gt;Updated Middleware&lt;/H3&gt;
&lt;LI-CODE lang="csharp"&gt;public class AuthMiddleware
{
private readonly RequestDelegate _next;
private readonly IAuthService _authService;

public AuthMiddleware(RequestDelegate next, IAuthService authService)
{
_next = next;
_authService = authService;
}

public async Task Invoke(HttpContext context)
{
var token = context.Request.Headers["Authorization"];

if (!_authService.ValidateToken(token))
{
context.Response.StatusCode = 401;
return;
}

await _next(context);
}
}&lt;/LI-CODE&gt;
&lt;H3 data-section-id="stfrq6" data-start="6285" data-end="6309"&gt;Dependency Injection&lt;/H3&gt;
&lt;LI-CODE lang="csharp"&gt;services.AddScoped&amp;lt;IAuthService, AuthService&amp;gt;();&lt;/LI-CODE&gt;
&lt;H2 data-section-id="1273t9r" data-start="6380" data-end="6390"&gt;Outcome&lt;/H2&gt;
&lt;P data-start="6392" data-end="6444"&gt;The transformation delivers measurable improvements:&lt;/P&gt;
&lt;UL data-start="6446" data-end="6597"&gt;
&lt;LI data-section-id="y6ac6v" data-start="6446" data-end="6486"&gt;&lt;STRONG data-start="6448" data-end="6484"&gt;Centralized authentication logic&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="4lsd73" data-start="6487" data-end="6518"&gt;&lt;STRONG data-start="6489" data-end="6516"&gt;Improved security model&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="p1xvzp" data-start="6519" data-end="6566"&gt;&lt;STRONG data-start="6521" data-end="6564"&gt;Consistent implementation across layers&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="10a1wyw" data-start="6567" data-end="6597"&gt;&lt;STRONG data-start="6569" data-end="6595"&gt;Reduced technical debt&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-section-id="1syppf4" data-start="6604" data-end="6634"&gt;Best Practices for Adoption&lt;/H2&gt;
&lt;P data-start="6636" data-end="6662"&gt;To maximize effectiveness:&lt;/P&gt;
&lt;UL data-start="6664" data-end="6854"&gt;
&lt;LI data-section-id="b3o9yr" data-start="6664" data-end="6704"&gt;Provide &lt;STRONG data-start="6674" data-end="6702"&gt;clear, structured intent&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-section-id="fy9ddi" data-start="6705" data-end="6748"&gt;Review generated plans before execution&lt;/LI&gt;
&lt;LI data-section-id="7yicq2" data-start="6749" data-end="6801"&gt;Validate changes through testing and code review&lt;/LI&gt;
&lt;LI data-section-id="xdb84j" data-start="6802" data-end="6854"&gt;Start with &lt;STRONG data-start="6815" data-end="6837"&gt;targeted scenarios&lt;/STRONG&gt; before scaling&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 data-section-id="8dtpi" data-start="7101" data-end="7114"&gt;Conclusion&lt;/H2&gt;
&lt;P data-start="7116" data-end="7370"&gt;GitHub Copilot Workspace represents a meaningful advancement in AI-assisted development. By enabling developers to define intent and delegate execution, it supports &lt;STRONG data-start="7294" data-end="7369"&gt;repository-wide transformations with greater consistency and efficiency&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-start="7372" data-end="7540"&gt;As development workflows continue to evolve, tools that combine &lt;STRONG data-start="7436" data-end="7482"&gt;context awareness, planning, and execution&lt;/STRONG&gt; will play a central role in modern engineering practices.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2026 03:49:40 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-multi-file-refactoring-agents-with-github-copilot/ba-p/4516932</guid>
      <dc:creator>Devi_Priya</dc:creator>
      <dc:date>2026-05-05T03:49:40Z</dc:date>
    </item>
    <item>
      <title>Build and Deploy Logic App Workflows Using Visual Studio Code and CI/CD Pipeline</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/build-and-deploy-logic-app-workflows-using-visual-studio-code/ba-p/4516931</link>
      <description>&lt;P&gt;Throughout this guide, you'll create a Standard logic app workspace and project, build your workflow, and deploy it as a Standard logic app resource in Azure. This enables your workflow to run in a single-tenant Azure Logic Apps environment or within an App Service Environment v3 (restricted to Windows-based App Service plans).&lt;/P&gt;
&lt;P&gt;Key advantages of Standard logic apps include:&lt;/P&gt;
&lt;P&gt;You can locally develop, debug, run, and test workflows within the Visual Studio Code environment. Although both the Azure portal and Visual Studio Code support building, running, and deploying Standard logic app resources and workflows, Visual Studio Code allows you to perform all these actions locally, offering greater flexibility during development.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Prerequisites&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Visual Studio Code&lt;/LI&gt;
&lt;LI&gt;Azure Account extension for Visual Studio Code&lt;/LI&gt;
&lt;LI&gt;Download and install the following Visual Studio Code dependencies for your specific operating system using either method:&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;U&gt;Install all dependencies manually&lt;/U&gt;&amp;nbsp;--&amp;gt;&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/logic-apps/create-single-tenant-workflows-visual-studio-code#dependency-installer:~:text=ignore%20this%20message.-,Install%20each%20dependency%20separately,Expand%20table,-Dependency" target="_blank"&gt;For manual installation&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;U&gt;Install all dependencies automatically.&lt;/U&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Starting with version&amp;nbsp;&lt;STRONG&gt;2.81.5&lt;/STRONG&gt;, the&amp;nbsp;&lt;STRONG&gt;Azure Logic Apps (Standard) extension&lt;/STRONG&gt;&amp;nbsp;for Visual Studio Code includes a dependency installer that automatically installs all the required dependencies in a new binary folder and leaves any existing dependencies unchanged.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For more information, see&amp;nbsp;&lt;A href="https://techcommunity.microsoft.com/t5/azure-integration-services-blog/making-it-easy-to-get-started-with-the-azure-logic-apps-standard/ba-p/3979643" target="_blank"&gt;Get started more easily with the Azure Logic Apps (Standard) extension for Visual Studio Code&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;This extension includes the following dependencies:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Dependency&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Description&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://marketplace.visualstudio.com/items?itemName=ms-vscode.csharp" target="_blank"&gt;C# for Visual Studio Code&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Enables F5 functionality to run your workflow.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://github.com/Azure/Azurite#visual-studio-code-extension" target="_blank"&gt;Azurite for Visual Studio Code&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Provides a local data store and emulator to use with Visual Studio Code so that you can work on your logic app project and run your workflows in your local development environment. If you don't want Azurite to automatically start, you can disable this option:&lt;BR /&gt;&lt;BR /&gt;1. On the&amp;nbsp;File&amp;nbsp;menu, select&amp;nbsp;Preferences&amp;nbsp;&amp;gt;&amp;nbsp;Settings.&lt;BR /&gt;&lt;BR /&gt;2. On the&amp;nbsp;User&amp;nbsp;tab, select&amp;nbsp;Extensions&amp;nbsp;&amp;gt;&amp;nbsp;Azure Logic Apps (Standard).&lt;BR /&gt;&lt;BR /&gt;3. Find the setting named&amp;nbsp;Azure Logic Apps Standard: Auto Start Azurite, and clear the selected checkbox.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;A href="https://dotnet.microsoft.com/download/dotnet/6.0" target="_blank"&gt;.NET SDK 6.x.x&lt;/A&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Includes the .NET Runtime 6.x.x, a prerequisite for the Azure Logic Apps (Standard) runtime.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Azure Functions Core Tools - 4.x version&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Installs the version based on your operating system (&lt;A href="https://github.com/Azure/azure-functions-core-tools/releases" target="_blank"&gt;Windows&lt;/A&gt;,&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-functions/functions-run-local?tabs=macos#install-the-azure-functions-core-tools" target="_blank"&gt;macOS&lt;/A&gt;, or&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-functions/functions-run-local?tabs=linux#install-the-azure-functions-core-tools" target="_blank"&gt;Linux&lt;/A&gt;).&lt;BR /&gt;&lt;BR /&gt;These tools include a version of the same runtime that powers the Azure Functions runtime, which the Azure Logic Apps (Standard) extension uses in Visual Studio Code.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://nodejs.org/en/download/releases/" target="_blank"&gt;Node.js version 16.x.x unless a newer version is already installed&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Required to enable the&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-add-run-inline-code" target="_blank"&gt;Inline Code Operations action&lt;/A&gt; that runs JavaScript.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Set up Visual Studio code&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;To make sure that all the extensions are correctly installed, reload or restart Visual Studio Code.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;Confirm that the&amp;nbsp;&lt;STRONG&gt;Azure Logic Apps Standard: Project Runtime&lt;/STRONG&gt;&amp;nbsp;setting for the Azure Logic Apps (Standard) extension is set to version&amp;nbsp;&lt;STRONG&gt;~4&lt;/STRONG&gt;:&lt;/LI&gt;
&lt;LI&gt;On the File menu, go to Preferences &amp;gt; Settings.&lt;/LI&gt;
&lt;LI&gt;On the User tab, go to &amp;gt; Extensions &amp;gt; Azure Logic Apps (Standard).&lt;/LI&gt;
&lt;LI&gt;You can find the Azure Logic Apps Standard: Project Runtime setting here or use the search box to find other settings:&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Connect to your Azure account&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;On the Visual Studio Code Activity Bar, select the Azure icon.&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the&amp;nbsp;&lt;STRONG&gt;Azure&lt;/STRONG&gt;&amp;nbsp;window, on the&amp;nbsp;&lt;STRONG&gt;Workspace&lt;/STRONG&gt;&amp;nbsp;section toolbar, from the&amp;nbsp;&lt;STRONG&gt;Azure Logic Apps&lt;/STRONG&gt;&amp;nbsp;menu, select&amp;nbsp;&lt;STRONG&gt;Create New Project&lt;/STRONG&gt;.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From the templates list that appears, select either&amp;nbsp;&lt;STRONG&gt;Stateful Workflow&lt;/STRONG&gt;or&amp;nbsp;&lt;STRONG&gt;Stateless Workflow&lt;/STRONG&gt;.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Provide a name for your workflow and press Enter.&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;If Visual Studio Code prompts you to open your project in the current Visual Studio Code or in a new Visual Studio Code window, select&amp;nbsp;&lt;STRONG&gt;Open in current window&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Visual Studio Code finishes creating your project.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The Explorer pane shows your project, which now includes automatically generated project files. For example, the project has a folder that shows your workflow's name. Inside this folder, the&amp;nbsp;&lt;STRONG&gt;workflow.json&lt;/STRONG&gt; file contains your workflow's underlying JSON definition.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Open the&amp;nbsp;&lt;STRONG&gt;workflow.json&lt;/STRONG&gt;&amp;nbsp;file's shortcut menu, and select&amp;nbsp;&lt;STRONG&gt;Open Designer&lt;/STRONG&gt;.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;If it asks for&amp;nbsp;&lt;STRONG&gt;Enable connectors in Azure&lt;/STRONG&gt;, select&amp;nbsp;&lt;STRONG&gt;Use connectors from Azure&lt;/STRONG&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;UL&gt;
&lt;LI&gt;After the&amp;nbsp;Select subscription&amp;nbsp;list opens, select the Azure subscription to use for your logic app project.&lt;/LI&gt;
&lt;LI&gt;After the resource groups list opens, select&amp;nbsp;RG to use for your logic app project.&lt;/LI&gt;
&lt;LI&gt;After you perform this step, Visual Studio Code opens the workflow designer.&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P&gt;After you open a blank workflow in the designer, the Add a trigger prompt appears on the designer. You can now start creating your workflow by adding a trigger and actions and save it.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;Run, test, and debug locally&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Make sure to start the emulator before you run your workflow:&lt;/LI&gt;
&lt;LI&gt;In Visual Studio Code, from the&amp;nbsp;View&amp;nbsp;menu, select&amp;nbsp;Command Palette.&lt;/LI&gt;
&lt;LI&gt;After the command palette appears, enter&amp;nbsp;Azurite: Start.&lt;/LI&gt;
&lt;LI&gt;On the Visual Studio Code Activity Bar, open the Run menu, and select Start Debugging (F5).&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;
&lt;P&gt;The Terminal window opens so that you can review the debugging session.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Now, find the callback URL for the endpoint on the Request trigger.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Reopen the Explorer pane so that you can view your project.&lt;/LI&gt;
&lt;LI&gt;From the jsonfile's shortcut menu, select Overview.&lt;/LI&gt;
&lt;/OL&gt;
&lt;img /&gt;&lt;img /&gt;
&lt;P&gt;Click on Run trigger&lt;/P&gt;
&lt;P&gt;If it is stateful workflow, you’ll be able to see the status as shown below.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;To view it, click on identifier.&lt;/P&gt;
&lt;P&gt;It will open a new window with the results.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Note: Incase while using storage account in your workflow if you get any forbidden error then whitelist your IP in that storage account and rerun the workflow by choosing Run and debug in VS code.&lt;/P&gt;
&lt;P&gt;Upon completion stop the debug by choosing the stop button and push the code to azure repo using git commands to push the code.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Use a Pipeline to Deploy the Created Workflow&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Build.yaml&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;jobs: - job: logic_app_build displayName: "Build and publish Logic App" steps: - script: sudo apt-get update &amp;amp;&amp;amp; sudo apt-get install -y zip displayName: 'Install zip utility' - task: CopyFiles@2 displayName: 'Create project folder' inputs: sourceFolder: '$(System.DefaultWorkingDirectory)' contents: | azure_logicapps/** targetFolder: 'project_output' - task: ArchiveFiles@2 displayName: 'Create project Zip' inputs: rootFolderOrFile: '$(System.DefaultWorkingDirectory)/project_output/azure_logicapps' includeRootFolder: false archiveType: 'zip' archiveFile: '$(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip' replaceExistingArchive: true - task: PublishPipelineArtifact@1 displayName: 'Publish project zip artifact' inputs: targetPath: '$(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip' artifact: 'logicAppCIArtifact' publishLocation: 'pipeline'&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Deploy.yaml&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;jobs: - deployment: deploy_logicapp_resources displayName: Deploy Logic App environment: ${{ parameters.environmentToDeploy }} strategy: runOnce: deploy: steps: - download: current artifact: logicAppCIArtifact - task: AzureFunctionApp@1 displayName: 'Deploy Logic App workflows' inputs: azureSubscription: ${{ parameters.azureServiceConnection }} appType: 'functionApp' appName: ${{ parameters.vars.LogicAppName }} package: '$(Pipeline.Workspace)/logicAppCIArtifact/$(Build.BuildId).zip' deploymentMethod: 'zipDeploy'&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2026 03:39:26 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/build-and-deploy-logic-app-workflows-using-visual-studio-code/ba-p/4516931</guid>
      <dc:creator>Devi_Priya</dc:creator>
      <dc:date>2026-05-05T03:39:26Z</dc:date>
    </item>
    <item>
      <title>Running GitHub Actions Runners on Azure Container Apps with KEDA Autoscaling</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/running-github-actions-runners-on-azure-container-apps-with-keda/ba-p/4512980</link>
      <description>&lt;P data-line="8"&gt;GitHub-hosted runners work well for most scenarios. But as workloads grow, teams often need:&lt;/P&gt;
&lt;UL data-line="10"&gt;
&lt;LI data-line="10"&gt;&lt;STRONG&gt;Better cost optimization&lt;/STRONG&gt;&amp;nbsp;— pay only when jobs run&lt;/LI&gt;
&lt;LI data-line="11"&gt;&lt;STRONG&gt;More control&lt;/STRONG&gt;&amp;nbsp;over execution environments and installed tools&lt;/LI&gt;
&lt;LI data-line="12"&gt;&lt;STRONG&gt;Scalable parallel execution&lt;/STRONG&gt;&amp;nbsp;— run 10, 20, or 50 jobs simultaneously&lt;/LI&gt;
&lt;LI data-line="13"&gt;&lt;STRONG&gt;Network access&lt;/STRONG&gt;&amp;nbsp;to private resources (databases, internal APIs)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="15"&gt;Traditionally, this is solved using&amp;nbsp;&lt;STRONG&gt;self-hosted runners on Virtual Machines&lt;/STRONG&gt;. But VMs come with challenges — always-on cost, manual scaling, patching, and maintenance overhead.&lt;/P&gt;
&lt;P data-line="17"&gt;In this guide, we'll walk through a&amp;nbsp;&lt;STRONG&gt;modern, serverless alternative&lt;/STRONG&gt;:&lt;/P&gt;
&lt;P data-line="19"&gt;👉&amp;nbsp;&lt;STRONG&gt;Running self-hosted GitHub runners on Azure Container Apps Jobs with KEDA autoscaling&lt;/STRONG&gt;&lt;/P&gt;
&lt;H3 data-line="21"&gt;What You Will Build&lt;/H3&gt;
&lt;P data-line="23"&gt;By the end of this guide:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Capability&lt;/th&gt;&lt;th&gt;What You Get&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Runner type&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Self-hosted, ephemeral (one job = one container)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Scaling&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Automatic via KEDA — scales to zero when idle&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Cost&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Zero cost when no jobs are running&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Infrastructure&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Fully managed by Azure Container Apps&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Security&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Secrets stored in Azure Key Vault, Managed Identity for auth&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H1 data-section-id="1xnsmb8" data-start="1135" data-end="1162"&gt;Architecture Overview&lt;/H1&gt;
&lt;P&gt;Here's how the system works at a high level:&lt;/P&gt;
&lt;img /&gt;&lt;img /&gt;
&lt;H3 data-line="51"&gt;How It Works (Step by Step):&lt;/H3&gt;
&lt;OL data-line="53"&gt;
&lt;LI data-line="53"&gt;A developer pushes code or triggers a GitHub Actions workflow&lt;/LI&gt;
&lt;LI data-line="54"&gt;GitHub queues the job and looks for a runner with matching labels&lt;/LI&gt;
&lt;LI data-line="55"&gt;&lt;STRONG&gt;KEDA&lt;/STRONG&gt;&amp;nbsp;(built into Container Apps) polls the GitHub Actions API for pending jobs&lt;/LI&gt;
&lt;LI data-line="56"&gt;When a pending job is detected, KEDA triggers the&amp;nbsp;&lt;STRONG&gt;Container App Job&lt;/STRONG&gt;&amp;nbsp;to start a new execution&lt;/LI&gt;
&lt;LI data-line="57"&gt;A fresh container starts, registers itself as a&amp;nbsp;&lt;STRONG&gt;self-hosted runner&lt;/STRONG&gt;&amp;nbsp;with GitHub&lt;/LI&gt;
&lt;LI data-line="58"&gt;The runner picks up the job, executes it, and reports results back to GitHub&lt;/LI&gt;
&lt;LI data-line="59"&gt;The container&amp;nbsp;&lt;STRONG&gt;shuts down and is destroyed&lt;/STRONG&gt;&amp;nbsp;— fully ephemeral&lt;/LI&gt;
&lt;LI data-line="60"&gt;When no jobs are pending, KEDA scales back to&amp;nbsp;&lt;STRONG&gt;zero&lt;/STRONG&gt; — no cost&lt;/LI&gt;
&lt;/OL&gt;
&lt;H1 data-section-id="yczsi9" data-start="1378" data-end="1395"&gt;Runtime Flow&lt;/H1&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;&lt;img /&gt;
&lt;H1 data-section-id="mj9kq3" data-start="1444" data-end="1462"&gt;Pre-requisites&lt;/H1&gt;
&lt;P data-line="66"&gt;Before you begin, make sure you have:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Requirement&lt;/th&gt;&lt;th&gt;Details&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;GitHub account&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;With a repository or organization where you want to run workflows&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Azure subscription&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;With permissions to create resources (Contributor role or higher)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Azure CLI&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Installed locally, OR use Azure Cloud Shell (no install needed)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Basic knowledge&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Familiarity with GitHub Actions and Azure Portal&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-line="75"&gt;Where You'll Run Commands&lt;/H3&gt;
&lt;P data-line="77"&gt;Throughout this guide, you'll need a terminal to create files and run CLI commands. You have&amp;nbsp;&lt;STRONG&gt;three options&lt;/STRONG&gt;:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Option&lt;/th&gt;&lt;th&gt;When to Use&lt;/th&gt;&lt;th&gt;How to Open&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;VS Code Terminal&lt;/STRONG&gt;&amp;nbsp; (Recommended)&lt;/td&gt;&lt;td&gt;You have VS Code installed locally&lt;/td&gt;&lt;td&gt;Open VS Code →&amp;nbsp;Ctrl + ``&amp;nbsp;(backtick) → Terminal opens at the bottom&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Azure Cloud Shell&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;No local tools installed, or restricted machine&lt;/td&gt;&lt;td&gt;Go to&amp;nbsp;&lt;A href="https://portal.azure.com/" target="_blank" rel="noopener" data-href="https://portal.azure.com"&gt;portal.azure.com&lt;/A&gt;&amp;nbsp;→ click the&amp;nbsp;&amp;gt;_&amp;nbsp;icon in the top toolbar&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Any terminal&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;PowerShell, CMD, Bash — whatever you prefer&lt;/td&gt;&lt;td&gt;Just ensure Azure CLI (az) is installed&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="85"&gt;💡&lt;STRONG&gt;recommended to use VS Code&lt;/STRONG&gt;&amp;nbsp;because you'll create files (Dockerfile, start.sh) AND run commands — VS Code lets you do both in one place.&lt;/P&gt;
&lt;H3 data-line="87"&gt;Azure Resources We Will Create&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Resource&lt;/th&gt;&lt;th&gt;Purpose&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Resource Group&lt;/td&gt;&lt;td&gt;Logical container for all resources&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Azure Container Registry (ACR)&lt;/td&gt;&lt;td&gt;Stores the runner Docker image&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Azure Container Apps Environment&lt;/td&gt;&lt;td&gt;Hosting environment for container jobs&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Azure Container App Job&lt;/td&gt;&lt;td&gt;The actual runner job definition&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Azure Key Vault&lt;/td&gt;&lt;td&gt;Securely stores the GitHub PAT token&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Managed Identity&lt;/td&gt;&lt;td&gt;Allows the container job to access ACR and Key Vault without passwords&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="98"&gt;💡&amp;nbsp;&lt;STRONG&gt;Note&lt;/STRONG&gt;: This guide covers&amp;nbsp;&lt;STRONG&gt;organization-level runners&lt;/STRONG&gt;. For&amp;nbsp;&lt;STRONG&gt;repository-level runners&lt;/STRONG&gt;, the only difference is the GitHub API endpoint used for registration. We'll call out the differences where applicable.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2 data-line="102"&gt;Step 1: Create a GitHub Personal Access Token (PAT)&lt;/H2&gt;
&lt;P data-line="104"&gt;Before we touch Azure, we need a token that allows our runner to register with GitHub. You can use either a&amp;nbsp;&lt;STRONG&gt;Fine-grained token&lt;/STRONG&gt;&amp;nbsp;(recommended) or a&amp;nbsp;&lt;STRONG&gt;Classic token&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3 data-line="106"&gt;Option A: Fine-Grained PAT&amp;nbsp; (Recommended)&lt;/H3&gt;
&lt;P data-line="108"&gt;Fine-grained tokens let you scope access to&amp;nbsp;&lt;STRONG&gt;specific repositories only&lt;/STRONG&gt;. This is critical for two reasons:&lt;/P&gt;
&lt;OL data-line="110"&gt;
&lt;LI data-line="110"&gt;&lt;STRONG&gt;Avoid GitHub API rate limits:&lt;/STRONG&gt;&amp;nbsp;KEDA continuously polls the GitHub API for pending jobs. If your token has access to your entire org (potentially hundreds of repos), KEDA scans all of them on every polling cycle. GitHub allows only&amp;nbsp;&lt;STRONG&gt;5,000 API requests/hour&lt;/STRONG&gt;&amp;nbsp;— with broad access, you'll hit this limit quickly and KEDA will stop detecting jobs.&lt;/LI&gt;
&lt;LI data-line="111"&gt;&lt;STRONG&gt;Security:&lt;/STRONG&gt;&amp;nbsp;Least-privilege access — the token only works on the repos you explicitly select.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="113"&gt;🔑&amp;nbsp;&lt;STRONG&gt;This is why we recommend fine-grained tokens over classic tokens.&lt;/STRONG&gt;&amp;nbsp;By selecting only the repos that need runners, KEDA polls fewer repos and stays well within API limits.&lt;/P&gt;
&lt;OL data-line="115"&gt;
&lt;LI data-line="115"&gt;Go to&amp;nbsp;&lt;A href="https://github.com/" target="_blank" rel="noopener" data-href="https://github.com"&gt;github.com&lt;/A&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Settings&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Developer settings&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="116"&gt;Click&amp;nbsp;&lt;STRONG&gt;Personal access tokens&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Fine-grained tokens&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="117"&gt;Click&amp;nbsp;&lt;STRONG&gt;Generate new token&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="118"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Token name&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;container-app-runner&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Expiration&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Choose based on your needs (e.g., 90 days)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Resource owner&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Your GitHub username or org&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Repository access&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Only select repositories&lt;/STRONG&gt;&amp;nbsp;→ pick the repos where you want runners&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="127"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;IMPORTANT — Remember these repo names!&lt;/STRONG&gt;&amp;nbsp;The repos you select here are the ONLY repos this token can access. Later in&amp;nbsp;&lt;STRONG&gt;Step 8&lt;/STRONG&gt;, when you configure the KEDA scale rule, you must list these&amp;nbsp;&lt;STRONG&gt;exact same repos&lt;/STRONG&gt;&amp;nbsp;in the&amp;nbsp;repos&amp;nbsp;metadata field. KEDA uses this token to poll GitHub for pending jobs — if a repo isn't included in the token, KEDA can't see its jobs and your runners won't scale for it.&lt;/P&gt;
&lt;P data-line="129"&gt;&lt;STRONG&gt;Example:&lt;/STRONG&gt;&amp;nbsp;If you select&amp;nbsp;my-app&amp;nbsp;and&amp;nbsp;my-api&amp;nbsp;here, your KEDA config must have&amp;nbsp;repos: my-app,my-api.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;OL data-line="131"&gt;
&lt;LI data-line="131"&gt;Under&amp;nbsp;&lt;STRONG&gt;Permissions&lt;/STRONG&gt;, set the following:&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="133"&gt;&lt;STRONG&gt;Repository permissions&lt;/STRONG&gt;&amp;nbsp;(required):&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Permission&lt;/th&gt;&lt;th&gt;Access Level&lt;/th&gt;&lt;th&gt;Why It's Needed&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Actions&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Read and write&lt;/td&gt;&lt;td&gt;Manage workflow runs and artifacts&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Administration&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Read and write&lt;/td&gt;&lt;td&gt;Register and manage self-hosted runners&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Metadata&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Read-only&lt;/td&gt;&lt;td&gt;&lt;EM&gt;(Auto-selected, required)&lt;/EM&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Workflows&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Read and write&lt;/td&gt;&lt;td&gt;Update GitHub Action workflow files&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="142"&gt;&lt;STRONG&gt;Organization permissions&lt;/STRONG&gt;&amp;nbsp;(only if using org-level runners):&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Permission&lt;/th&gt;&lt;th&gt;Access Level&lt;/th&gt;&lt;th&gt;Why It's Needed&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Self-hosted runners&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Read and write&lt;/td&gt;&lt;td&gt;Register runners at the org level&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="148"&gt;📝&amp;nbsp;&lt;STRONG&gt;For personal accounts (no org):&lt;/STRONG&gt;&amp;nbsp;You only need the&amp;nbsp;&lt;STRONG&gt;Repository permissions&lt;/STRONG&gt;&amp;nbsp;above. Skip the Organization permissions.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;OL data-line="150"&gt;
&lt;LI data-line="150"&gt;Click&amp;nbsp;&lt;STRONG&gt;Generate token&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="151"&gt;&lt;STRONG&gt;⚠️ IMPORTANT: Copy the token NOW&lt;/STRONG&gt;&amp;nbsp;— you won't be able to see it again!&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="153"&gt;Option B: Classic PAT&lt;/H3&gt;
&lt;P data-line="155"&gt;If you prefer a classic token (simpler but broader access):&lt;/P&gt;
&lt;OL data-line="157"&gt;
&lt;LI data-line="157"&gt;Go to&amp;nbsp;&lt;STRONG&gt;Settings&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Developer settings&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Personal access tokens&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Tokens (classic)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="158"&gt;Click&amp;nbsp;&lt;STRONG&gt;Generate new token (classic)&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="159"&gt;Select these scopes:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Scope&lt;/th&gt;&lt;th&gt;Why It's Needed&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;✅&amp;nbsp;repo&lt;/td&gt;&lt;td&gt;Full access to repositories&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;✅&amp;nbsp;workflow&lt;/td&gt;&lt;td&gt;Manage workflows&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;✅&amp;nbsp;admin:org&lt;/td&gt;&lt;td&gt;Required for org-level runners (skip for personal repos)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="167"&gt;
&lt;LI data-line="167"&gt;Click&amp;nbsp;&lt;STRONG&gt;Generate token&lt;/STRONG&gt;&amp;nbsp;and copy it immediately&lt;/LI&gt;
&lt;/OL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="169"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;Classic tokens give access to ALL repositories&lt;/STRONG&gt;&amp;nbsp;in your account. For better security and to avoid API rate limits, prefer fine-grained tokens scoped to specific repos.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P data-line="171"&gt;Save the token somewhere safe temporarily. We'll store it in Azure Key Vault in the next steps.&lt;/P&gt;
&lt;H3 data-line="172"&gt;Where Will Runners Appear on GitHub?&lt;/H3&gt;
&lt;P data-line="174"&gt;Once runners are deployed and register with GitHub, you can see them here:&lt;/P&gt;
&lt;P data-line="176"&gt;&lt;STRONG&gt;For repository-level runners:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-line="177"&gt;
&lt;LI data-line="177"&gt;Go to your repo →&amp;nbsp;&lt;STRONG&gt;Settings&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Actions&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Runners&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="179"&gt;&lt;STRONG&gt;For organization-level runners:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-line="180"&gt;
&lt;LI data-line="180"&gt;Go to your org →&amp;nbsp;&lt;STRONG&gt;Settings&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Actions&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Runners&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="182"&gt;Runners&amp;nbsp;&lt;STRONG&gt;automatically register themselves&lt;/STRONG&gt;&amp;nbsp;when the container starts. You'll see them appear as "Idle" or "Active" in the Runners list. You do&amp;nbsp;&lt;STRONG&gt;NOT&lt;/STRONG&gt;&amp;nbsp;need to manually create individual runners on GitHub.&lt;/P&gt;
&lt;H3 data-line="184"&gt;Setting Up Runner Groups (Organization-Level) — Recommended&lt;/H3&gt;
&lt;P data-line="186"&gt;Since this guide is for&amp;nbsp;&lt;STRONG&gt;organization-level runners&lt;/STRONG&gt;, it's recommended to create a&amp;nbsp;&lt;STRONG&gt;Runner Group&lt;/STRONG&gt;&amp;nbsp;on GitHub. Runner Groups let you control&amp;nbsp;&lt;STRONG&gt;which repositories&lt;/STRONG&gt;&amp;nbsp;in your org can use these runners.&lt;/P&gt;
&lt;H4 data-line="188"&gt;Steps to Create a Runner Group:&lt;/H4&gt;
&lt;OL data-line="190"&gt;
&lt;LI data-line="190"&gt;Go to your&amp;nbsp;&lt;STRONG&gt;GitHub Organization&lt;/STRONG&gt;&amp;nbsp;page (e.g.,&amp;nbsp;https://github.com/your-org)&lt;/LI&gt;
&lt;LI data-line="191"&gt;Click&amp;nbsp;&lt;STRONG&gt;Settings&lt;/STRONG&gt;&amp;nbsp;(top menu bar)&lt;/LI&gt;
&lt;LI data-line="192"&gt;In the left sidebar, expand&amp;nbsp;&lt;STRONG&gt;Actions&lt;/STRONG&gt;&amp;nbsp;→ click&amp;nbsp;&lt;STRONG&gt;Runner groups&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="193"&gt;Click&amp;nbsp;&lt;STRONG&gt;New runner group&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="194"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Name&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;container-app-runners&amp;nbsp;(or any descriptive name)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Repository access&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Choose one:&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td&gt;•&amp;nbsp;&lt;STRONG&gt;All repositories&lt;/STRONG&gt;&amp;nbsp;— any repo in the org can use these runners&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td&gt;•&amp;nbsp;&lt;STRONG&gt;Selected repositories&lt;/STRONG&gt;&amp;nbsp;— pick specific repos (recommended for control)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Allow public repositories&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Uncheck this for security (unless needed)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Workflow access&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Leave default (all workflows)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="205"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;CRITICAL: If ANY of the repositories using these runners are PUBLIC, you MUST check "Allow public repositories".&lt;/STRONG&gt;&amp;nbsp;If this is unchecked, GitHub will silently refuse to dispatch jobs from public repos to runners in this group — the runner will register and show as "Idle", but workflows will stay stuck in "Queued" forever. This is the&amp;nbsp;&lt;STRONG&gt;most common and hardest-to-debug issue&lt;/STRONG&gt;&amp;nbsp;with runner groups.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;OL data-line="207"&gt;
&lt;LI data-line="207"&gt;Click&amp;nbsp;&lt;STRONG&gt;Create group&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4 data-line="209"&gt;Why Create a Runner Group?&lt;/H4&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Without Runner Group&lt;/th&gt;&lt;th&gt;With Runner Group&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Any repo in the org can use your runners&lt;/td&gt;&lt;td&gt;Only selected repos can use them&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Harder to track which teams use runners&lt;/td&gt;&lt;td&gt;Clear visibility and access control&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Potential security risk for public repos&lt;/td&gt;&lt;td&gt;Can block public repos from using runners&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="217"&gt;📝&amp;nbsp;&lt;STRONG&gt;For personal GitHub accounts:&lt;/STRONG&gt;&amp;nbsp;Runner groups are not available. Runners will automatically appear under your repo's&amp;nbsp;&lt;STRONG&gt;Settings → Actions → Runners&lt;/STRONG&gt;. No extra setup needed.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2 data-line="220"&gt;Step 2: Create Azure Resources (Portal)&lt;/H2&gt;
&lt;P data-line="222"&gt;We'll create all the Azure infrastructure through the Azure Portal. If you prefer CLI, see the&amp;nbsp;&lt;A href="https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/shubhij/KEDA-Runners/CAJ-runners/blog/techcommunity-blog.md#appendix-a-cli-commands-for-all-steps" target="_blank" rel="noopener" data-href="#appendix-a-cli-commands-for-all-steps"&gt;CLI alternative&lt;/A&gt;&amp;nbsp;at the end.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="224"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;IMPORTANT: All Azure resources (Resource Group, ACR, Key Vault, Container Apps Environment, Container App Job) must be in the SAME region.&lt;/STRONG&gt;&amp;nbsp;Pick one region (e.g.,&amp;nbsp;Central US&amp;nbsp;or&amp;nbsp;West US 2) and use it for everything. Mixing regions can cause connectivity issues and increased latency.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3 data-line="226"&gt;2.1: Create a Resource Group&lt;/H3&gt;
&lt;OL data-line="228"&gt;
&lt;LI data-line="228"&gt;Go to&amp;nbsp;&lt;A href="https://portal.azure.com/" target="_blank" rel="noopener" data-href="https://portal.azure.com"&gt;Azure Portal&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="229"&gt;Search for&amp;nbsp;&lt;STRONG&gt;"Resource groups"&lt;/STRONG&gt;&amp;nbsp;in the top search bar&lt;/LI&gt;
&lt;LI data-line="230"&gt;Click&amp;nbsp;&lt;STRONG&gt;+ Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="231"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Subscription&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Select your Azure subscription&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Resource group&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;rg-github-runners&amp;nbsp;(or your preferred name)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Region&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;West US 2&amp;nbsp;(or any region that supports Container Apps)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="239"&gt;
&lt;LI data-line="239"&gt;Click&amp;nbsp;&lt;STRONG&gt;Review + create&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="241"&gt;2.2: Create an Azure Container Registry (ACR)&lt;/H3&gt;
&lt;P data-line="243"&gt;This is where we'll store our runner Docker image.&lt;/P&gt;
&lt;OL data-line="245"&gt;
&lt;LI data-line="245"&gt;Search for&amp;nbsp;&lt;STRONG&gt;"Container registries"&lt;/STRONG&gt;&amp;nbsp;in the Azure Portal&lt;/LI&gt;
&lt;LI data-line="246"&gt;Click&amp;nbsp;&lt;STRONG&gt;+ Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="247"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Subscription&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Your subscription&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Resource group&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;rg-github-runners&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Registry name&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;yourregistryname&amp;nbsp;(must be globally unique, lowercase, letters and numbers only)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Location&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Same as your resource group (e.g.,&amp;nbsp;West US 2)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Pricing plan&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Basic&lt;/STRONG&gt;&amp;nbsp;(sufficient for this guide; use Standard/Premium for production)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="257"&gt;
&lt;LI data-line="257"&gt;Click&amp;nbsp;&lt;STRONG&gt;Review + create&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="258"&gt;Once created, note down the&amp;nbsp;&lt;STRONG&gt;Login server&lt;/STRONG&gt;&amp;nbsp;(e.g.,&amp;nbsp;yourregistryname.azurecr.io) — you'll need this later&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="260"&gt;2.3: Create an Azure Key Vault&lt;/H3&gt;
&lt;OL data-line="262"&gt;
&lt;LI data-line="262"&gt;Search for&amp;nbsp;&lt;STRONG&gt;"Key vaults"&lt;/STRONG&gt;&amp;nbsp;in the Azure Portal&lt;/LI&gt;
&lt;LI data-line="263"&gt;Click&amp;nbsp;&lt;STRONG&gt;+ Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="264"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Subscription&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Your subscription&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Resource group&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;rg-github-runners&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Key vault name&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;kv-github-runners&amp;nbsp;(must be globally unique)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Region&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Same region&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Pricing tier&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Standard&lt;/STRONG&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="274"&gt;
&lt;LI data-line="274"&gt;Go to the&amp;nbsp;&lt;STRONG&gt;Access configuration&lt;/STRONG&gt;&amp;nbsp;tab:
&lt;UL data-line="275"&gt;
&lt;LI data-line="275"&gt;Select&amp;nbsp;&lt;STRONG&gt;Azure role-based access control (RBAC)&lt;/STRONG&gt;&amp;nbsp;as the permission model&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-line="276"&gt;Click&amp;nbsp;&lt;STRONG&gt;Review + create&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="278"&gt;2.4: Store the GitHub PAT in Key Vault&lt;/H3&gt;
&lt;OL data-line="280"&gt;
&lt;LI data-line="280"&gt;Open your newly created Key Vault&lt;/LI&gt;
&lt;LI data-line="281"&gt;First, give yourself permission:
&lt;UL data-line="282"&gt;
&lt;LI data-line="282"&gt;Go to&amp;nbsp;&lt;STRONG&gt;Access Control (IAM)&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;+ Add role assignment&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="283"&gt;Role:&amp;nbsp;&lt;STRONG&gt;Key Vault Secrets Officer&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="284"&gt;Assign to: Your own Azure account&lt;/LI&gt;
&lt;LI data-line="285"&gt;Click&amp;nbsp;&lt;STRONG&gt;Review + assign&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-line="286"&gt;Now go to&amp;nbsp;&lt;STRONG&gt;Objects&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Secrets&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;+ Generate/Import&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="287"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Upload options&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Manual&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Name&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-pat&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Secret value&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Paste your GitHub PAT from Step 1&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="295"&gt;
&lt;LI data-line="295"&gt;Click&amp;nbsp;&lt;STRONG&gt;Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2 data-line="299"&gt;Step 3: Create the Runner Docker Image&lt;/H2&gt;
&lt;P data-line="301"&gt;This is the Docker image that will run as your GitHub Actions runner. We need two files: a&amp;nbsp;Dockerfile&amp;nbsp;and a&amp;nbsp;start.sh&amp;nbsp;script.&lt;/P&gt;
&lt;H3 data-line="303"&gt;3.1: Set Up Your Working Directory&lt;/H3&gt;
&lt;OL data-line="305"&gt;
&lt;LI data-line="305"&gt;Open&amp;nbsp;&lt;STRONG&gt;VS Code&lt;/STRONG&gt;&amp;nbsp;on your local machine&lt;/LI&gt;
&lt;LI data-line="306"&gt;Open the integrated terminal:&amp;nbsp;&lt;STRONG&gt;Ctrl + `&lt;/STRONG&gt;&amp;nbsp;(backtick) or&amp;nbsp;&lt;STRONG&gt;Terminal → New Terminal&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="307"&gt;Create a new folder and navigate into it:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang=""&gt;mkdir github-runner-image 
cd github-runner-image&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL data-line="314"&gt;
&lt;LI data-line="314"&gt;You'll create two files in this folder:&amp;nbsp;Dockerfile&amp;nbsp;and&amp;nbsp;start.sh&lt;/LI&gt;
&lt;LI data-line="315"&gt;In VS Code, click&amp;nbsp;&lt;STRONG&gt;File → Open Folder&lt;/STRONG&gt;&amp;nbsp;and open the&amp;nbsp;github-runner-image&amp;nbsp;folder (so you can edit files easily)&lt;/LI&gt;
&lt;/OL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="317"&gt;📌&amp;nbsp;&lt;STRONG&gt;All commands in Step 3 and Step 4 should be run from inside this&amp;nbsp;github-runner-image&amp;nbsp;folder.&lt;/STRONG&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3 data-line="319"&gt;3.2: Choose Your Approach&lt;/H3&gt;
&lt;P data-line="321"&gt;There are two approaches to creating the runner image:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Approach&lt;/th&gt;&lt;th&gt;Best For&lt;/th&gt;&lt;th&gt;Docker Required Locally?&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Option A&lt;/STRONG&gt;: Build locally with Docker&lt;/td&gt;&lt;td&gt;Development/testing&lt;/td&gt;&lt;td&gt;✅ Yes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Option B&lt;/STRONG&gt;: Build remotely with ACR Tasks&lt;/td&gt;&lt;td&gt;Production / no Docker access&lt;/td&gt;&lt;td&gt;❌ No&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="328"&gt;We'll create the same files for both approaches. The only difference is the&amp;nbsp;&lt;STRONG&gt;build command&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3 data-line="330"&gt;3.3: Create the&amp;nbsp;start.sh&amp;nbsp;Script&lt;/H3&gt;
&lt;P data-line="332"&gt;This script runs when the container starts. It registers the runner with GitHub, executes the job, and then the container shuts down.&lt;/P&gt;
&lt;P data-line="334"&gt;Create a file named &lt;STRONG&gt;start.sh:&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;#!/bin/bash
set -e

# ────────────────────────────────────────────
# CONFIGURATION
# ────────────────────────────────────────────
# These values are passed as environment variables
# GITHUB_PAT    → Your GitHub Personal Access Token
# GITHUB_OWNER  → Your GitHub org or username
# GITHUB_REPO   → (Optional) Repository name for repo-level runners
# RUNNER_SCOPE  → "org" or "repo"
# RUNNER_LABELS → Comma-separated labels (e.g., "container-app,linux")
# RUNNER_GROUP  → Runner group name (org-level only, default: "Default")
#                  Set this to the runner group you created in Step 1 (e.g., "container-app-runners")
#                  If not set, runners register in GitHub's "Default" group — which means
#                  ANY repo in the org can use them and you lose access control.

RUNNER_SCOPE="${RUNNER_SCOPE:-org}"
RUNNER_LABELS="${RUNNER_LABELS:-container-app}"
RUNNER_GROUP="${RUNNER_GROUP:-Default}"

# ⚠️ IMPORTANT: Always set the RUNNER_GROUP environment variable on your Container App Job
#    to match the runner group you created on GitHub (e.g., "container-app-runners").
#    The "Default" fallback above is only a safety net — do NOT rely on it.

# ────────────────────────────────────────────
# GET REGISTRATION TOKEN
# ────────────────────────────────────────────
if [ "$RUNNER_SCOPE" == "org" ]; then
    echo "🔑 Requesting registration token for organization: $GITHUB_OWNER"
    REG_TOKEN=$(curl -s -X POST \
        -H "Authorization: token $GITHUB_PAT" \
        -H "Accept: application/vnd.github+json" \
        "https://api.github.com/orgs/${GITHUB_OWNER}/actions/runners/registration-token" \
        | jq -r .token)
    RUNNER_URL="https://github.com/${GITHUB_OWNER}"
else
    echo "🔑 Requesting registration token for repository: $GITHUB_OWNER/$GITHUB_REPO"
    REG_TOKEN=$(curl -s -X POST \
        -H "Authorization: token $GITHUB_PAT" \
        -H "Accept: application/vnd.github+json" \
        "https://api.github.com/repos/${GITHUB_OWNER}/${GITHUB_REPO}/actions/runners/registration-token" \
        | jq -r .token)
    RUNNER_URL="https://github.com/${GITHUB_OWNER}/${GITHUB_REPO}"
fi

if [ -z "$REG_TOKEN" ] || [ "$REG_TOKEN" == "null" ]; then
    echo "❌ Failed to get registration token. Check your GITHUB_PAT and permissions."
    exit 1
fi

echo "✅ Registration token obtained successfully"

# ────────────────────────────────────────────
# CONFIGURE RUNNER
# ────────────────────────────────────────────
echo "⚙️ Configuring runner..."
./config.sh --unattended \
    --name "runner-$(hostname)" \
    --url "$RUNNER_URL" \
    --token "$REG_TOKEN" \
    --runnergroup "$RUNNER_GROUP" \
    --ephemeral \
    --labels "$RUNNER_LABELS" \
    --replace

echo "🚀 Starting runner..."
./run.sh&lt;/LI-CODE&gt;
&lt;P data-line="407"&gt;&lt;STRONG&gt;Key flags explained:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-line="408"&gt;
&lt;LI data-line="408"&gt;
&lt;BLOCKQUOTE&gt;--ephemeral: Runner processes&amp;nbsp;&lt;STRONG&gt;one job&lt;/STRONG&gt;&amp;nbsp;then exits (container stops)&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI data-line="409"&gt;
&lt;BLOCKQUOTE&gt;--runnergroup: Registers the runner in a specific runner group (org-level only)&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI data-line="410"&gt;
&lt;BLOCKQUOTE&gt;--replace: Replaces any existing runner with the same name&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI data-line="411"&gt;
&lt;BLOCKQUOTE&gt;--unattended: No interactive prompts&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="413"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;Do NOT use&amp;nbsp;--disableupdate!&lt;/STRONG&gt;&amp;nbsp;In newer GitHub versions, this flag prevents GitHub from dispatching jobs to the runner. The runner will appear as "Idle" but never pick up work.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3 data-line="415"&gt;3.4: Create the&amp;nbsp;Dockerfile&lt;/H3&gt;
&lt;P data-line="417"&gt;Create a file named&amp;nbsp;&lt;STRONG&gt;Dockerfile&lt;/STRONG&gt;:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;FROM ubuntu:22.04

# Prevent interactive prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive

# Install required dependencies
RUN apt-get update &amp;amp;&amp;amp; apt-get install -y \
    curl \
    git \
    jq \
    ca-certificates \
    unzip \
    wget \
    apt-transport-https \
    software-properties-common \
    &amp;amp;&amp;amp; rm -rf /var/lib/apt/lists/*

# Create a non-root user for the runner (GitHub requires this)
RUN useradd -m runner

# Set up the runner directory
WORKDIR /home/runner/actions-runner

# Download the latest GitHub Actions Runner
# Check latest version: curl -s https://api.github.com/repos/actions/runner/releases/latest | jq -r '.tag_name'
ARG RUNNER_VERSION=2.334.0
RUN curl -L -o actions-runner.tar.gz \
    "https://github.com/actions/runner/releases/download/v${RUNNER_VERSION}/actions-runner-linux-x64-${RUNNER_VERSION}.tar.gz" \
    &amp;amp;&amp;amp; tar xzf actions-runner.tar.gz \
    &amp;amp;&amp;amp; rm actions-runner.tar.gz

# Install runner dependencies
RUN ./bin/installdependencies.sh

# Copy the startup script
COPY start.sh .
RUN chmod +x start.sh

# Set ownership to the runner user
RUN chown -R runner:runner /home/runner

# Switch to non-root user
USER runner

ENTRYPOINT ["./start.sh"]&lt;/LI-CODE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="467"&gt;💡&amp;nbsp;&lt;STRONG&gt;Tip:&lt;/STRONG&gt;&amp;nbsp;Check the latest runner version by running:&lt;/P&gt;
&lt;P&gt;curl -s https://api.github.com/repos/actions/runner/releases/latest | jq -r '.tag_name'&lt;/P&gt;
&lt;P data-line="471"&gt;At the time of writing,&amp;nbsp;2.334.0&amp;nbsp;is the latest. Update the&amp;nbsp;ARG RUNNER_VERSION&amp;nbsp;value if a newer version is available.&lt;/P&gt;
&lt;P data-line="473"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;Using a deprecated runner version will cause runners to connect but refuse to pick up jobs.&lt;/STRONG&gt;&amp;nbsp;You'll see the error:&amp;nbsp;&lt;EM&gt;"Runner version vX.X.X is deprecated and cannot receive messages."&lt;/EM&gt; Always use a recent version.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2 data-line="477"&gt;Step 4: Build and Push the Docker Image&lt;/H2&gt;
&lt;P data-line="479"&gt;Now we need to build this image and push it to your Azure Container Registry (ACR).&lt;/P&gt;
&lt;H3 data-line="481"&gt;Option A: Build Locally with Docker (Development)&lt;/H3&gt;
&lt;P data-line="483"&gt;Use this if you have&amp;nbsp;&lt;STRONG&gt;Docker Desktop&lt;/STRONG&gt;&amp;nbsp;installed on your machine.&lt;/P&gt;
&lt;P data-line="485"&gt;&lt;STRONG&gt;Where to run:&lt;/STRONG&gt;&amp;nbsp;In the&amp;nbsp;&lt;STRONG&gt;VS Code terminal&lt;/STRONG&gt;&amp;nbsp;(or any terminal), make sure you're inside the&amp;nbsp;github-runner-image&amp;nbsp;folder where your&amp;nbsp;Dockerfile&amp;nbsp;and&amp;nbsp;start.sh&amp;nbsp;are located.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# 0. First, log in to Azure (this opens a browser window for authentication)
az login

# 1. Log in to your ACR (replace yourregistryname with your actual ACR name)
az acr login --name yourregistryname

# 2. Build the image
docker build -t yourregistryname.azurecr.io/github-runner:v1 .

# 3. Push the image to ACR
docker push yourregistryname.azurecr.io/github-runner:v1&lt;/LI-CODE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="501"&gt;📌&amp;nbsp;&lt;STRONG&gt;Make sure Docker Desktop is running&lt;/STRONG&gt;&amp;nbsp;before you execute these commands. If you see "Cannot connect to the Docker daemon", start Docker Desktop first.&lt;/P&gt;
&lt;P data-line="503"&gt;🔒&amp;nbsp;&lt;STRONG&gt;Don't have&amp;nbsp;az login&amp;nbsp;or Docker on your machine?&lt;/STRONG&gt;&amp;nbsp;Use&amp;nbsp;&lt;STRONG&gt;Azure Cloud Shell&lt;/STRONG&gt;&amp;nbsp;instead — it's a browser-based terminal at&amp;nbsp;&lt;A href="https://shell.azure.com/" target="_blank" rel="noopener" data-href="https://shell.azure.com"&gt;shell.azure.com&lt;/A&gt;&amp;nbsp;that comes pre-authenticated with Azure CLI (no&amp;nbsp;az login&amp;nbsp;needed) and has Docker available. See&amp;nbsp;&lt;STRONG&gt;Option B&lt;/STRONG&gt;&amp;nbsp;below if you can't use Docker at all.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3 data-line="505"&gt;Option B: Build Remotely with ACR Tasks (Production — No Docker Needed) ⭐&lt;/H3&gt;
&lt;P data-line="507"&gt;&lt;STRONG&gt;This is the recommended approach for production environments&lt;/STRONG&gt;&amp;nbsp;where:&lt;/P&gt;
&lt;UL data-line="508"&gt;
&lt;LI data-line="508"&gt;You don't have Docker installed&lt;/LI&gt;
&lt;LI data-line="509"&gt;Your production environment is fully private / locked down&lt;/LI&gt;
&lt;LI data-line="510"&gt;You want to build images directly in Azure without any local tooling&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="512"&gt;&lt;STRONG&gt;Where to run:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-line="513"&gt;
&lt;LI data-line="513"&gt;&lt;STRONG&gt;VS Code terminal&lt;/STRONG&gt;&amp;nbsp;→ Run&amp;nbsp;az login&amp;nbsp;first, then the build command&lt;/LI&gt;
&lt;LI data-line="514"&gt;&lt;STRONG&gt;Azure Cloud Shell&lt;/STRONG&gt;&amp;nbsp;(&lt;A href="https://shell.azure.com/" target="_blank" rel="noopener" data-href="https://shell.azure.com"&gt;shell.azure.com&lt;/A&gt;) → No&amp;nbsp;az login&amp;nbsp;needed, you're already authenticated. Upload your&amp;nbsp;Dockerfile&amp;nbsp;and&amp;nbsp;start.sh&amp;nbsp;files using the&amp;nbsp;&lt;STRONG&gt;Upload&lt;/STRONG&gt;&amp;nbsp;button in Cloud Shell, then run the build command&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="516"&gt;You must be inside the folder where your&amp;nbsp;Dockerfile&amp;nbsp;and&amp;nbsp;start.sh&amp;nbsp;are located.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# If running from VS Code terminal (skip this line if using Azure Cloud Shell)
az login

# Build directly in ACR — no Docker required!
az acr build --registry yourregistryname --image github-runner:v1 --file Dockerfile .&lt;/LI-CODE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="526"&gt;☝️&amp;nbsp;&lt;STRONG&gt;That's it — one single command.&lt;/STRONG&gt;&amp;nbsp;No Docker install, no Docker daemon, nothing.&lt;/P&gt;
&lt;P data-line="528"&gt;&lt;STRONG&gt;Using Azure Cloud Shell?&lt;/STRONG&gt;&amp;nbsp;To upload files:&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;OL data-line="529"&gt;
&lt;LI data-line="529"&gt;
&lt;BLOCKQUOTE&gt;Open&amp;nbsp;&lt;A href="https://shell.azure.com/" target="_blank" rel="noopener" data-href="https://shell.azure.com"&gt;shell.azure.com&lt;/A&gt;&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI data-line="530"&gt;
&lt;BLOCKQUOTE&gt;Click the&amp;nbsp;&lt;STRONG&gt;Upload/Download&lt;/STRONG&gt;&amp;nbsp;button (📁 icon) in the toolbar&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI data-line="531"&gt;
&lt;BLOCKQUOTE&gt;Upload&amp;nbsp;Dockerfile&amp;nbsp;and&amp;nbsp;start.sh&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI data-line="532"&gt;
&lt;BLOCKQUOTE&gt;They'll land in your home directory (~/). Run az acr build from there.&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="534"&gt;This command:&lt;/P&gt;
&lt;UL data-line="535"&gt;
&lt;LI data-line="535"&gt;Uploads your source code to ACR&lt;/LI&gt;
&lt;LI data-line="536"&gt;Builds the Docker image&amp;nbsp;&lt;STRONG&gt;in Azure&lt;/STRONG&gt;&amp;nbsp;(not on your machine)&lt;/LI&gt;
&lt;LI data-line="537"&gt;Tags and stores it in your registry&lt;/LI&gt;
&lt;LI data-line="538"&gt;No Docker daemon needed at all!&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="540"&gt;🔒&amp;nbsp;&lt;STRONG&gt;Production Note:&lt;/STRONG&gt;&amp;nbsp;In locked-down environments where even&amp;nbsp;az acr build&amp;nbsp;isn't possible from your machine, you can:&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;OL data-line="541"&gt;
&lt;LI data-line="541"&gt;
&lt;BLOCKQUOTE&gt;Use&amp;nbsp;&lt;STRONG&gt;Azure Cloud Shell&lt;/STRONG&gt;&amp;nbsp;(browser-based, always has Azure CLI)&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI data-line="542"&gt;
&lt;BLOCKQUOTE&gt;Set up an&amp;nbsp;&lt;STRONG&gt;ACR Task&lt;/STRONG&gt;&amp;nbsp;with a&amp;nbsp;&lt;STRONG&gt;Git trigger&lt;/STRONG&gt;&amp;nbsp;— ACR automatically builds when you push to a repo&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;LI data-line="543"&gt;
&lt;BLOCKQUOTE&gt;Use&amp;nbsp;&lt;STRONG&gt;Azure DevOps / GitHub Actions pipeline&lt;/STRONG&gt;&amp;nbsp;to build and push the image&lt;/BLOCKQUOTE&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;U&gt;&lt;EM&gt;&lt;STRONG&gt;Example&lt;/STRONG&gt;&lt;/EM&gt;&lt;/U&gt;: Auto-build from a GitHub repo (single line for Cloud Shell)&lt;/P&gt;
&lt;LI-CODE lang=""&gt;az acr task create --registry yourregistryname --name build-runner-image --image "github-runner:{{.Run.ID}}" --context https://github.com/your-org/your-runner-repo.git --file Dockerfile --git-access-token YOUR_GITHUB_PAT&lt;/LI-CODE&gt;
&lt;H3 data-line="550"&gt;Verify the Image&lt;/H3&gt;
&lt;P data-line="552"&gt;After building, confirm the image exists in your registry:&lt;/P&gt;
&lt;P data-line="554"&gt;&lt;STRONG&gt;Portal:&lt;/STRONG&gt;&amp;nbsp;Go to your ACR →&amp;nbsp;&lt;STRONG&gt;Repositories&lt;/STRONG&gt;&amp;nbsp;→ you should see&amp;nbsp;github-runner&amp;nbsp;listed&lt;/P&gt;
&lt;P data-line="556"&gt;&lt;STRONG&gt;CLI:&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;az acr repository list --name yourregistryname --output table&lt;/LI-CODE&gt;
&lt;H2 data-line="563"&gt;Step 5: Create the Container Apps Environment&lt;/H2&gt;
&lt;P data-line="565"&gt;The Container Apps Environment is the hosting platform for your container jobs. Think of it as the "cluster" where your runners will live.&lt;/P&gt;
&lt;H3 data-line="567"&gt;Steps (Azure Portal):&lt;/H3&gt;
&lt;OL data-line="569"&gt;
&lt;LI data-line="569"&gt;Search for&amp;nbsp;&lt;STRONG&gt;"Container Apps Environment"&lt;/STRONG&gt;&amp;nbsp;in the Azure Portal&lt;/LI&gt;
&lt;LI data-line="570"&gt;Click&amp;nbsp;&lt;STRONG&gt;+ Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="571"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Tab&lt;/th&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Basics&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Subscription&lt;/td&gt;&lt;td&gt;Your subscription&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td&gt;Resource group&lt;/td&gt;&lt;td&gt;rg-github-runners&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td&gt;Environment name&lt;/td&gt;&lt;td&gt;cae-github-runners&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td&gt;Region&lt;/td&gt;&lt;td&gt;West US 2&amp;nbsp;(same as other resources)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td&gt;Environment type&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Consumption only&lt;/STRONG&gt;&amp;nbsp;(or Consumption + Dedicated if you need workload profiles)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Monitoring&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Log Analytics workspace&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Create new&lt;/STRONG&gt;&amp;nbsp;or select existing&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="582"&gt;
&lt;LI data-line="582"&gt;Leave&amp;nbsp;&lt;STRONG&gt;Networking&lt;/STRONG&gt;&amp;nbsp;as defaults for now (we'll discuss production networking later)&lt;/LI&gt;
&lt;LI data-line="583"&gt;Click&amp;nbsp;&lt;STRONG&gt;Review + create&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="585"&gt;⏳ This takes 1-2 minutes to create.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2 data-line="589"&gt;Step 6: Create the Container App Job&lt;/H2&gt;
&lt;P data-line="591"&gt;This is the core resource — the Container App Job that will run your GitHub runners.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="593"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;IMPORTANT: Complete Step 4 (Build &amp;amp; Push Image) BEFORE this step.&lt;/STRONG&gt;&amp;nbsp;The Container App Job creation form requires you to select a container image from your ACR. If your ACR is empty (no images pushed), the portal will show an error:&amp;nbsp;&lt;EM&gt;"The ACR does not have any images. Please push an image to the ACR and try again."&lt;/EM&gt;&amp;nbsp;So make sure your image is pushed first!&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3 data-line="595"&gt;Steps (Azure Portal):&lt;/H3&gt;
&lt;OL data-line="597"&gt;
&lt;LI data-line="597"&gt;Search for&amp;nbsp;&lt;STRONG&gt;"Container App Jobs"&lt;/STRONG&gt;&amp;nbsp;in the Azure Portal search bar&lt;/LI&gt;
&lt;LI data-line="598"&gt;Click&amp;nbsp;&lt;STRONG&gt;+ Create Container App Job&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="600"&gt;Tab 1: Basics&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Subscription&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Your subscription&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Resource group&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;rg-github-runners&lt;/td&gt;&lt;td&gt;Same RG as other resources&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Container app job name&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-runner-job&lt;/td&gt;&lt;td&gt;Lowercase, letters, numbers, hyphens&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Region&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;West US 2&lt;/td&gt;&lt;td&gt;Must match your CAE region&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Container Apps Environment&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;cae-github-runners&lt;/td&gt;&lt;td&gt;Select the environment created in Step 5&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="610"&gt;Click&amp;nbsp;&lt;STRONG&gt;Next: Container &amp;gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;H3 data-line="612"&gt;Tab 2: Container&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Name&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-runner&lt;/td&gt;&lt;td&gt;Name of the container within the job&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Image source&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Azure Container Registry&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Select this radio button&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Registry&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;yourregistryname.azurecr.io&lt;/td&gt;&lt;td&gt;Select your ACR from the dropdown&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Image&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-runner&lt;/td&gt;&lt;td&gt;Select the image you pushed&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Image tag&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;v1&lt;/td&gt;&lt;td&gt;The tag you used during build&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Registry authentication&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Managed identity&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Leave as default&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Managed identity&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;System assigned Identity (environment)&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Leave as default — Azure will auto-assign ACR Pull role&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Command override&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;EM&gt;(Leave empty)&lt;/EM&gt;&lt;/td&gt;&lt;td&gt;Our Dockerfile already has an ENTRYPOINT&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Arguments override&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;EM&gt;(Leave empty)&lt;/EM&gt;&lt;/td&gt;&lt;td&gt;Not needed&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Workload profile&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Consumption&lt;/td&gt;&lt;td&gt;Default is fine&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;CPU and memory&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;0.5 CPU cores, 1 Gi memory&lt;/td&gt;&lt;td&gt;Increase if your jobs need more resources&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="628"&gt;💡&amp;nbsp;&lt;STRONG&gt;CPU/Memory guidance:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-line="629"&gt;
&lt;LI data-line="629"&gt;0.5 CPU / 1 Gi&amp;nbsp;— Light jobs (linting, simple tests)&lt;/LI&gt;
&lt;LI data-line="630"&gt;1 CPU / 2 Gi&amp;nbsp;— Medium jobs (building apps, running test suites)&lt;/LI&gt;
&lt;LI data-line="631"&gt;2 CPU / 4 Gi&amp;nbsp;— Heavy jobs (compiling large projects, ML workloads)&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 data-line="633"&gt;Environment Variables&lt;/H4&gt;
&lt;P data-line="635"&gt;Click&amp;nbsp;&lt;STRONG&gt;+ Add&lt;/STRONG&gt;&amp;nbsp;for each variable:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Source&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;GITHUB_OWNER&lt;/td&gt;&lt;td&gt;Manual&lt;/td&gt;&lt;td&gt;Your GitHub org name (e.g.,&amp;nbsp;Quality-Framework)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RUNNER_SCOPE&lt;/td&gt;&lt;td&gt;Manual&lt;/td&gt;&lt;td&gt;org&amp;nbsp;(or&amp;nbsp;repo&amp;nbsp;for repo-level runners)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RUNNER_LABELS&lt;/td&gt;&lt;td&gt;Manual&lt;/td&gt;&lt;td&gt;container-app&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;GITHUB_REPO&lt;/td&gt;&lt;td&gt;Manual&lt;/td&gt;&lt;td&gt;Comma-separated repo names that this runner will serve (e.g.,&amp;nbsp;my-app,my-api,my-infra). These should match the repos selected in your PAT (Step 1).&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RUNNER_GROUP&lt;/td&gt;&lt;td&gt;Manual&lt;/td&gt;&lt;td&gt;The GitHub runner group name (e.g.,&amp;nbsp;container-app-runners). Must match the group created in Step 1. If not set, defaults to&amp;nbsp;Default.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;GITHUB_PAT&lt;/td&gt;&lt;td&gt;&lt;EM&gt;(We'll configure this as a secret reference in Step 7)&lt;/EM&gt;&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="646"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;Do NOT put the PAT directly as an environment variable value.&lt;/STRONG&gt;&amp;nbsp;We will securely reference it from Key Vault in Step 7.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P data-line="648"&gt;For now, add only&amp;nbsp;GITHUB_OWNER,&amp;nbsp;RUNNER_SCOPE,&amp;nbsp;RUNNER_LABELS,&amp;nbsp;GITHUB_REPO, and&amp;nbsp;RUNNER_GROUP. We'll add&amp;nbsp;GITHUB_PAT&amp;nbsp;after setting up the identity and secret.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="650"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;Make sure&amp;nbsp;RUNNER_SCOPE&amp;nbsp;matches&amp;nbsp;runnerScope&amp;nbsp;in the scale rule below.&lt;/STRONG&gt;&amp;nbsp;If you're using org-level runners, both should be&amp;nbsp;org.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H4 data-line="652"&gt;Scale Rule Settings (same page, scroll down)&lt;/H4&gt;
&lt;P data-line="654"&gt;Below the environment variables, you'll see&amp;nbsp;&lt;STRONG&gt;Scale rule settings&lt;/STRONG&gt;:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Min executions&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;Scale to zero when no jobs are pending&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Max executions&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;Maximum parallel runners (adjust as needed).&amp;nbsp;&lt;STRONG&gt;Do NOT set to 0&lt;/STRONG&gt;&amp;nbsp;— this means unlimited and can cause hundreds of runner containers to spawn!&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Polling interval&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;How often (in seconds) KEDA checks for pending jobs&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H4 data-line="662"&gt;Scale Rules — Click&amp;nbsp;&lt;STRONG&gt;+ Add&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P data-line="664"&gt;A side panel opens with the&amp;nbsp;&lt;STRONG&gt;"Add scale rule"&lt;/STRONG&gt;&amp;nbsp;form. Fill in:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Rule name&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-runner-rule&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Custom rule type&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-runner&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="671"&gt;&lt;STRONG&gt;Scale parameters&lt;/STRONG&gt;&amp;nbsp;(key-value pairs — click + Add for each):&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;githubAPIURL&lt;/td&gt;&lt;td&gt;https://api.github.com&lt;/td&gt;&lt;td&gt;Pre-filled by the portal, leave as-is&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;owner&lt;/td&gt;&lt;td&gt;your-github-org&lt;/td&gt;&lt;td&gt;Your GitHub org or username (e.g.,&amp;nbsp;Quality-Framework)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;runnerScope&lt;/td&gt;&lt;td&gt;org&lt;/td&gt;&lt;td&gt;org&amp;nbsp;for org-level,&amp;nbsp;repo&amp;nbsp;for repo-level&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;repos&lt;/td&gt;&lt;td&gt;your-repo-name&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Must match the repos you selected in your PAT (Step 1).&lt;/STRONG&gt;&amp;nbsp;Comma-separated, no spaces.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;targetWorkflowQueueLength&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Number of pending jobs needed to trigger one runner&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;labels&lt;/td&gt;&lt;td&gt;container-app&lt;/td&gt;&lt;td&gt;Must match&amp;nbsp;RUNNER_LABELS&amp;nbsp;env var and&amp;nbsp;runs-on&amp;nbsp;in your workflow&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="682"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;CRITICAL: Do NOT skip the&amp;nbsp;labels&amp;nbsp;parameter!&lt;/STRONG&gt;&amp;nbsp;Without it, KEDA cannot match pending jobs to your runner and will always show&amp;nbsp;MetricValue: 0.00&amp;nbsp;— meaning no containers will ever start. This is the most common setup mistake.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P data-line="684"&gt;&lt;STRONG&gt;Authentication:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Secret reference&lt;/th&gt;&lt;th&gt;Trigger parameter&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;EM&gt;(Leave empty for now — we'll configure this after creating the job and setting up Key Vault secrets in Step 7)&lt;/EM&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;personalAccessToken&lt;/STRONG&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="690"&gt;💡&amp;nbsp;&lt;STRONG&gt;The portal may let you save the scale rule without a secret reference.&lt;/STRONG&gt;&amp;nbsp;If it blocks you, delete the Authentication row entirely, click&amp;nbsp;&lt;STRONG&gt;Add scale rule&lt;/STRONG&gt;, and we'll add the authentication after the job is created.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P data-line="692"&gt;Click&amp;nbsp;&lt;STRONG&gt;Add scale rule&lt;/STRONG&gt;&amp;nbsp;→ Then click&amp;nbsp;&lt;STRONG&gt;Review + create&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Create&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-line="694"&gt;⏳ Creation takes about 1 minute.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="696"&gt;⚠️&amp;nbsp;&lt;STRONG&gt;First-time creation may fail with an image pull error&lt;/STRONG&gt;&amp;nbsp;if you're creating a new Container Apps Environment at the same time. This happens because the environment's managed identity gets the AcrPull role during deployment, but the image pull happens before the role propagates. If this occurs, simply&amp;nbsp;&lt;STRONG&gt;Redeploy&lt;/STRONG&gt;&amp;nbsp;or create the Container App Job again — the role is already assigned and the second attempt will succeed.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2 data-line="700"&gt;Step 7: Configure Managed Identity and Secrets&lt;/H2&gt;
&lt;P data-line="702"&gt;Now we need to:&lt;/P&gt;
&lt;OL data-line="703"&gt;
&lt;LI data-line="703"&gt;Enable Managed Identity on the Container App Job&lt;/LI&gt;
&lt;LI data-line="704"&gt;Grant it access to Key Vault and ACR&lt;/LI&gt;
&lt;LI data-line="705"&gt;Reference the GitHub PAT as a secret&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="707"&gt;7.1: Enable System-Assigned Managed Identity&lt;/H3&gt;
&lt;OL data-line="709"&gt;
&lt;LI data-line="709"&gt;Open your Container App Job (github-runner-job)&lt;/LI&gt;
&lt;LI data-line="710"&gt;In the left menu, go to&amp;nbsp;&lt;STRONG&gt;Settings&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Identity&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="711"&gt;Under&amp;nbsp;&lt;STRONG&gt;System assigned&lt;/STRONG&gt;, toggle&amp;nbsp;&lt;STRONG&gt;Status&lt;/STRONG&gt;&amp;nbsp;to&amp;nbsp;&lt;STRONG&gt;On&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="712"&gt;Click&amp;nbsp;&lt;STRONG&gt;Save&lt;/STRONG&gt;&amp;nbsp;→ Click&amp;nbsp;&lt;STRONG&gt;Yes&lt;/STRONG&gt;&amp;nbsp;to confirm&lt;/LI&gt;
&lt;LI data-line="713"&gt;Note the&amp;nbsp;&lt;STRONG&gt;Object ID&lt;/STRONG&gt;&amp;nbsp;that appears — you'll need this&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="715"&gt;7.2: Grant Key Vault Access&lt;/H3&gt;
&lt;OL data-line="717"&gt;
&lt;LI data-line="717"&gt;Go to your Key Vault (kv-github-runners)&lt;/LI&gt;
&lt;LI data-line="718"&gt;Go to&amp;nbsp;&lt;STRONG&gt;Access Control (IAM)&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;+ Add role assignment&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="719"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Role&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Key Vault Secrets User&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Assign access to&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Managed identity&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Members&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Search for&amp;nbsp;github-runner-job&amp;nbsp;and select it&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="727"&gt;
&lt;LI data-line="727"&gt;Click&amp;nbsp;&lt;STRONG&gt;Review + assign&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="729"&gt;7.3: Grant ACR Pull Access&lt;/H3&gt;
&lt;OL data-line="731"&gt;
&lt;LI data-line="731"&gt;Go to your Container Registry (yourregistryname)&lt;/LI&gt;
&lt;LI data-line="732"&gt;Go to&amp;nbsp;&lt;STRONG&gt;Access Control (IAM)&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;+ Add role assignment&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="733"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Role&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;AcrPull&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Assign access to&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Managed identity&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Members&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Search for&amp;nbsp;github-runner-job&amp;nbsp;and select it&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="741"&gt;
&lt;LI data-line="741"&gt;Click&amp;nbsp;&lt;STRONG&gt;Review + assign&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="743"&gt;7.4: Add GitHub PAT as a Secret Reference&lt;/H3&gt;
&lt;OL data-line="745"&gt;
&lt;LI data-line="745"&gt;Go back to your Container App Job (github-runner-job)&lt;/LI&gt;
&lt;LI data-line="746"&gt;In the left menu, go to&amp;nbsp;&lt;STRONG&gt;Settings&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Secrets&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="747"&gt;Click&amp;nbsp;&lt;STRONG&gt;+ Add&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="748"&gt;Fill in:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Type&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Key Vault reference&lt;/STRONG&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Key&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-pat&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Key Vault secret URL&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Select your Key Vault and the&amp;nbsp;github-pat&amp;nbsp;secret&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Managed Identity&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;System assigned&lt;/STRONG&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="757"&gt;
&lt;LI data-line="757"&gt;Click&amp;nbsp;&lt;STRONG&gt;Add&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="759"&gt;7.5: Map the Secret to an Environment Variable&lt;/H3&gt;
&lt;OL data-line="761"&gt;
&lt;LI data-line="761"&gt;Go to&amp;nbsp;&lt;STRONG&gt;Settings&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Containers&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="762"&gt;Click on your container →&amp;nbsp;&lt;STRONG&gt;Edit&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="763"&gt;Go to the&amp;nbsp;&lt;STRONG&gt;Environment variables&lt;/STRONG&gt;&amp;nbsp;tab&lt;/LI&gt;
&lt;LI data-line="764"&gt;Click&amp;nbsp;&lt;STRONG&gt;+ Add&lt;/STRONG&gt;:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Source&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;GITHUB_PAT&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Reference a secret&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-pat&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="770"&gt;
&lt;LI data-line="770"&gt;Click&amp;nbsp;&lt;STRONG&gt;Save&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="772"&gt;💡&amp;nbsp;&lt;STRONG&gt;If the Save button is greyed out&lt;/STRONG&gt;, try making a small edit to another field first (e.g., click into a value and click out) to trigger the save state. Alternatively, re-create the container with the correct env vars.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2 data-line="776"&gt;Step 8: Configure KEDA Scale Rule (GitHub Runner Scaler)&lt;/H2&gt;
&lt;P data-line="778"&gt;This is where the magic happens. KEDA's GitHub Runner scaler monitors the GitHub Actions API for pending workflow jobs and scales your Container App Job accordingly.&lt;/P&gt;
&lt;H3 data-line="780"&gt;Steps:&lt;/H3&gt;
&lt;OL data-line="782"&gt;
&lt;LI data-line="782"&gt;Open your Container App Job (github-runner-job)&lt;/LI&gt;
&lt;LI data-line="783"&gt;Go to&amp;nbsp;&lt;STRONG&gt;Settings&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Scale&lt;/STRONG&gt;&amp;nbsp;(or&amp;nbsp;&lt;STRONG&gt;Scale and replicas&lt;/STRONG&gt;)&lt;/LI&gt;
&lt;LI data-line="784"&gt;Under&amp;nbsp;&lt;STRONG&gt;Scale rule&lt;/STRONG&gt;, click&amp;nbsp;&lt;STRONG&gt;+ Add&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="785"&gt;Fill in the scale rule:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Name&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-runner-rule&lt;/td&gt;&lt;td&gt;Any descriptive name&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Type&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Custom&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Select Custom&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Custom rule type&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-runner&lt;/td&gt;&lt;td&gt;This is the KEDA scaler type&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="793"&gt;
&lt;LI data-line="793"&gt;Under&amp;nbsp;&lt;STRONG&gt;Metadata&lt;/STRONG&gt;, add these key-value pairs:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Key&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;owner&lt;/td&gt;&lt;td&gt;your-github-org&lt;/td&gt;&lt;td&gt;Your GitHub org or username (e.g.,&amp;nbsp;Quality-Framework)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;repos&lt;/td&gt;&lt;td&gt;repo1,repo2&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Must match the repos you selected in your PAT token (Step 1).&lt;/STRONG&gt;&amp;nbsp;Comma-separated, no spaces. E.g.,&amp;nbsp;qualityframework-demo,qualityframework-bicep. Do NOT leave empty — if left empty, KEDA scans ALL org repos and you'll hit GitHub API rate limits.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;runnerScope&lt;/td&gt;&lt;td&gt;org&lt;/td&gt;&lt;td&gt;org&amp;nbsp;for org-level runners,&amp;nbsp;repo&amp;nbsp;for repo-level&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;labels&lt;/td&gt;&lt;td&gt;container-app&lt;/td&gt;&lt;td&gt;Must match the&amp;nbsp;RUNNER_LABELS&amp;nbsp;env var and the&amp;nbsp;runs-on&amp;nbsp;labels in your workflow YAML&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;targetWorkflowQueueLength&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Number of pending jobs needed to trigger one new runner instance&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="803"&gt;
&lt;LI data-line="803"&gt;Under&amp;nbsp;&lt;STRONG&gt;Authentication&lt;/STRONG&gt;, add:&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Key&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Secret reference&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;github-pat&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Trigger parameter&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;personalAccessToken&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;OL data-line="810"&gt;
&lt;LI data-line="810"&gt;Click&amp;nbsp;&lt;STRONG&gt;Add&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Save&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="812"&gt;Understanding the Scale Rule&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Scenario&lt;/th&gt;&lt;th&gt;KEDA Action&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;0 pending jobs with&amp;nbsp;container-app&amp;nbsp;label&lt;/td&gt;&lt;td&gt;0 runners (scale to zero)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;1 pending job&lt;/td&gt;&lt;td&gt;Starts 1 container&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;3 pending jobs&lt;/td&gt;&lt;td&gt;Starts 3 containers (up to max)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Jobs complete&lt;/td&gt;&lt;td&gt;Containers stop, scale back to zero&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2 data-line="823"&gt;Step 9: Test the Setup&lt;/H2&gt;
&lt;H3 data-line="825"&gt;9.1: Create a Test Workflow&lt;/H3&gt;
&lt;P data-line="827"&gt;In any repository within your GitHub organization, create a workflow file:&lt;/P&gt;
&lt;P data-line="829"&gt;&lt;STRONG&gt;File:&lt;/STRONG&gt;&amp;nbsp;.github/workflows/test-container-runner.yml&lt;/P&gt;
&lt;LI-CODE lang=""&gt;name: Test Container App Runner

on:
  workflow_dispatch:    # Allows manual trigger from GitHub UI
  push:
    branches: [main]

jobs:
  test-runner:
    runs-on: [self-hosted, container-app]   # Must match your RUNNER_LABELS
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Print runner info
        run: |
          echo "✅ Hello from Azure Container App Runner!"
          echo "Runner Name: $RUNNER_NAME"
          echo "Runner OS: $RUNNER_OS"
          echo "Workspace: $GITHUB_WORKSPACE"

      - name: Run a simple test
        run: |
          echo "Current directory: $(pwd)"
          echo "Files in repo:"
          ls -la
          echo "System info:"
          uname -a
          echo "Memory:"
          free -h&lt;/LI-CODE&gt;
&lt;H3 data-line="864"&gt;9.2: Trigger the Workflow&lt;/H3&gt;
&lt;OL data-line="866"&gt;
&lt;LI data-line="866"&gt;Go to your repository on GitHub&lt;/LI&gt;
&lt;LI data-line="867"&gt;Click&amp;nbsp;&lt;STRONG&gt;Actions&lt;/STRONG&gt;&amp;nbsp;tab&lt;/LI&gt;
&lt;LI data-line="868"&gt;Select&amp;nbsp;&lt;STRONG&gt;"Test Container App Runner"&lt;/STRONG&gt;&amp;nbsp;from the left sidebar&lt;/LI&gt;
&lt;LI data-line="869"&gt;Click&amp;nbsp;&lt;STRONG&gt;Run workflow&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Run workflow&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="871"&gt;9.3: Watch It Work&lt;/H3&gt;
&lt;OL data-line="873"&gt;
&lt;LI data-line="873"&gt;In the GitHub Actions tab, you'll see the job show as&amp;nbsp;&lt;STRONG&gt;"Queued"&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="874"&gt;In Azure Portal, go to your Container App Job →&amp;nbsp;&lt;STRONG&gt;Execution history&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-line="875"&gt;Within ~30 seconds (your polling interval), you should see a new execution start&lt;/LI&gt;
&lt;LI data-line="876"&gt;The job will:
&lt;UL data-line="877"&gt;
&lt;LI data-line="877"&gt;Container starts → Runner registers → Job executes → Container stops&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-line="878"&gt;Back in GitHub, the workflow run should show as ✅&amp;nbsp;&lt;STRONG&gt;completed&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="880"&gt;Troubleshooting&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Problem&lt;/th&gt;&lt;th&gt;Likely Cause&lt;/th&gt;&lt;th&gt;Fix&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Job stays "Queued" forever&lt;/td&gt;&lt;td&gt;KEDA not detecting jobs&lt;/td&gt;&lt;td&gt;Check scale rule metadata — ensure&amp;nbsp;labels,&amp;nbsp;owner,&amp;nbsp;repos,&amp;nbsp;runnerScope&amp;nbsp;are all filled correctly&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Container starts but workflow doesn't complete&lt;/td&gt;&lt;td&gt;Wrong secret reference or empty env vars&lt;/td&gt;&lt;td&gt;Verify&amp;nbsp;GITHUB_PAT&amp;nbsp;env var points to correct secret, and all env vars have values&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;"Permission denied" errors&lt;/td&gt;&lt;td&gt;PAT missing required scopes&lt;/td&gt;&lt;td&gt;Edit PAT and add missing permissions (Step 1)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Image pull errors on first deploy&lt;/td&gt;&lt;td&gt;ACR access timing issue&lt;/td&gt;&lt;td&gt;Redeploy — the AcrPull role was assigned but hadn't propagated yet&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Runner registers but job doesn't run&lt;/td&gt;&lt;td&gt;Label mismatch&lt;/td&gt;&lt;td&gt;Ensure&amp;nbsp;runs-on&amp;nbsp;labels in workflow match&amp;nbsp;RUNNER_LABELS&amp;nbsp;env var AND&amp;nbsp;labels&amp;nbsp;in KEDA scale rule&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Runner shows "Idle" but job stays queued&lt;/td&gt;&lt;td&gt;--disableupdate&amp;nbsp;flag or wrong runner group&lt;/td&gt;&lt;td&gt;Remove&amp;nbsp;--disableupdate&amp;nbsp;from&amp;nbsp;start.sh&amp;nbsp;and rebuild the image. Also verify&amp;nbsp;RUNNER_GROUP&amp;nbsp;env var matches the runner group name on GitHub, and the group has the correct repos assigned&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Runner version deprecated error&lt;/td&gt;&lt;td&gt;Outdated runner binary&lt;/td&gt;&lt;td&gt;Update&amp;nbsp;RUNNER_VERSION&amp;nbsp;in&amp;nbsp;Dockerfile&amp;nbsp;to the latest version and rebuild. Run&amp;nbsp;curl -s https://api.github.com/repos/actions/runner/releases/latest | jq -r '.tag_name'&amp;nbsp;to check&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Hundreds of offline runners spawning&lt;/td&gt;&lt;td&gt;maxExecutions&amp;nbsp;set to 0 (unlimited)&lt;/td&gt;&lt;td&gt;Set&amp;nbsp;maxExecutions&amp;nbsp;to a reasonable limit (e.g., 5 or 10) in the scale rule settings&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Runner idle + job queued forever (public repo)&lt;/td&gt;&lt;td&gt;Runner group blocks public repos&lt;/td&gt;&lt;td&gt;Go to&amp;nbsp;&lt;STRONG&gt;Org Settings → Actions → Runner groups → your group&lt;/STRONG&gt;&amp;nbsp;and check&amp;nbsp;&lt;STRONG&gt;"Allow public repositories"&lt;/STRONG&gt;. Without this, GitHub silently refuses to dispatch jobs from public repos to runners in the group&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="894"&gt;&lt;STRONG&gt;To check container logs:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-line="896"&gt;Go to your Container App Job →&amp;nbsp;&lt;STRONG&gt;Monitoring&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG&gt;Logs&lt;/STRONG&gt;&amp;nbsp;and run:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;ContainerAppConsoleLogs_CL | where ContainerGroupName_s startswith "github-runner-job" | where TimeGenerated &amp;gt; ago(30m) | order by TimeGenerated desc | take 20&lt;/LI-CODE&gt;
&lt;H2 data-line="945"&gt;Production Considerations&lt;/H2&gt;
&lt;H3 data-line="947"&gt;🔒 Networking: Private Environments&lt;/H3&gt;
&lt;P data-line="949"&gt;In production, your Container Apps Environment may be deployed inside a&amp;nbsp;&lt;STRONG&gt;VNet&lt;/STRONG&gt;&amp;nbsp;with no public internet access. Here's how to handle that:&lt;/P&gt;
&lt;H4 data-line="951"&gt;Private ACR Access&lt;/H4&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Container App Job ──(private endpoint)──► ACR&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;OL data-line="957"&gt;
&lt;LI data-line="957"&gt;Create a&amp;nbsp;&lt;STRONG&gt;Private Endpoint&lt;/STRONG&gt;&amp;nbsp;for your ACR&lt;/LI&gt;
&lt;LI data-line="958"&gt;Disable public access on ACR&lt;/LI&gt;
&lt;LI data-line="959"&gt;Ensure your Container Apps Environment is in the same (or peered) VNet&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4 data-line="961"&gt;Private Key Vault Access&lt;/H4&gt;
&lt;P data-line="963"&gt;Same pattern — create a Private Endpoint for Key Vault and disable public access.&lt;/P&gt;
&lt;H4 data-line="965"&gt;GitHub API Access&lt;/H4&gt;
&lt;P data-line="967"&gt;Your runner containers need outbound access to:&lt;/P&gt;
&lt;UL data-line="968"&gt;
&lt;LI data-line="968"&gt;github.com&amp;nbsp;(runner registration)&lt;/LI&gt;
&lt;LI data-line="969"&gt;api.github.com&amp;nbsp;(KEDA polling)&lt;/LI&gt;
&lt;LI data-line="970"&gt;*.actions.githubusercontent.com&amp;nbsp;(downloading actions)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="972"&gt;If using a firewall or NSG, ensure these are allowed.&lt;/P&gt;
&lt;H3 data-line="974"&gt;🔐 Using GitHub App Instead of PAT (Recommended for Production)&lt;/H3&gt;
&lt;P data-line="976"&gt;PATs are tied to individual users and have broad scopes. For production, consider using a&amp;nbsp;&lt;STRONG&gt;GitHub App&lt;/STRONG&gt;:&lt;/P&gt;
&lt;OL data-line="978"&gt;
&lt;LI data-line="978"&gt;Create a GitHub App in your organization&lt;/LI&gt;
&lt;LI data-line="979"&gt;Grant it&amp;nbsp;Organization Self-hosted runners: Read &amp;amp; Write&amp;nbsp;permissions&lt;/LI&gt;
&lt;LI data-line="980"&gt;Install the app in your organization&lt;/LI&gt;
&lt;LI data-line="981"&gt;Use the App ID and Private Key instead of PAT&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="983"&gt;The KEDA GitHub Runner scaler supports GitHub App authentication natively.&lt;/P&gt;
&lt;H3 data-line="985"&gt;📊 Monitoring and Alerting&lt;/H3&gt;
&lt;P data-line="987"&gt;Set up monitoring for your runners:&lt;/P&gt;
&lt;OL data-line="989"&gt;
&lt;LI data-line="989"&gt;&lt;STRONG&gt;Container App Job Metrics&lt;/STRONG&gt;&amp;nbsp;(Azure Monitor):
&lt;UL data-line="990"&gt;
&lt;LI data-line="990"&gt;Execution count&lt;/LI&gt;
&lt;LI data-line="991"&gt;Execution duration&lt;/LI&gt;
&lt;LI data-line="992"&gt;Failed executions&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-line="994"&gt;&lt;STRONG&gt;Alerts&lt;/STRONG&gt;&amp;nbsp;to set up:
&lt;UL data-line="995"&gt;
&lt;LI data-line="995"&gt;Alert when executions fail repeatedly&lt;/LI&gt;
&lt;LI data-line="996"&gt;Alert when execution queue is growing (KEDA can't keep up)&lt;/LI&gt;
&lt;LI data-line="997"&gt;Alert when runner registration fails (PAT expired?)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-line="999"&gt;&lt;STRONG&gt;Log Analytics queries:&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang=""&gt;// Find failed executions
ContainerAppConsoleLogs_CL
| where ContainerGroupName_s startswith "github-runner-job"
| where Log_s contains "error" or Log_s contains "failed"
| order by TimeGenerated desc&lt;/LI-CODE&gt;
&lt;H3 data-line="1009"&gt;💰 Cost Optimization&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Strategy&lt;/th&gt;&lt;th&gt;Impact&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Scale to zero (min: 0)&lt;/td&gt;&lt;td&gt;No cost when idle&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Right-size CPU/memory&lt;/td&gt;&lt;td&gt;Don't over-provision&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Set reasonable max executions&lt;/td&gt;&lt;td&gt;Prevent runaway costs&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Use Consumption plan&lt;/td&gt;&lt;td&gt;Pay per-second billing&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Set replica timeout&lt;/td&gt;&lt;td&gt;Kill stuck jobs&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-line="1019"&gt;🔄 Keeping the Runner Image Updated&lt;/H3&gt;
&lt;P data-line="1021"&gt;Runner versions get outdated. Set up automated rebuilds:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# Create a scheduled ACR Task to rebuild weekly (single line for Cloud Shell)
az acr task create --registry yourregistryname --name rebuild-runner-weekly --image github-runner:latest --context https://github.com/your-org/runner-image-repo.git --file Dockerfile --schedule "0 0 * * 0" --git-access-token YOUR_PAT&lt;/LI-CODE&gt;
&lt;H2 data-line="1030"&gt;Appendix A: CLI Commands for All Steps&lt;/H2&gt;
&lt;P data-line="1032"&gt;If you prefer CLI over Portal, here are all the commands.&amp;nbsp;&lt;STRONG&gt;Run these in Azure Cloud Shell or any terminal with Azure CLI.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-line="1034"&gt;All commands are written as&amp;nbsp;&lt;STRONG&gt;single lines&lt;/STRONG&gt;&amp;nbsp;so they work directly in Azure Cloud Shell without formatting issues.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# ─── Set your variables (update these values) ───
RESOURCE_GROUP="rg-github-runners"
LOCATION="westus2"
ACR_NAME="yourregistryname"
KV_NAME="kvgithubrunners"
CAE_NAME="cae-github-runners"
JOB_NAME="github-runner-job"
IMAGE_NAME="github-runner"
IMAGE_TAG="v1"
GITHUB_ORG="your-github-org"

# ─── Step 1: Create Resource Group ───
az group create --name $RESOURCE_GROUP --location $LOCATION

# ─── Step 2: Create ACR ───
az acr create --name $ACR_NAME --resource-group $RESOURCE_GROUP --sku Basic

# ─── Step 3: Create Key Vault ───
az keyvault create --name $KV_NAME --resource-group $RESOURCE_GROUP --location $LOCATION --enable-rbac-authorization

# ─── Step 4: Store PAT in Key Vault ───
az keyvault secret set --vault-name $KV_NAME --name "github-pat" --value "YOUR_PAT_HERE"

# ─── Step 5: Build Image with ACR Tasks (run from the folder with Dockerfile) ───
az acr build --registry $ACR_NAME --image $IMAGE_NAME:$IMAGE_TAG .

# ─── Step 6: Create Container Apps Environment ───
az containerapp env create --name $CAE_NAME --resource-group $RESOURCE_GROUP --location $LOCATION

# ─── Step 7: Create Container App Job (single command) ───
az containerapp job create --name $JOB_NAME --resource-group $RESOURCE_GROUP --environment $CAE_NAME --trigger-type Event --replica-timeout 1800 --replica-retry-limit 1 --replica-completion-count 1 --parallelism 1 --image "$ACR_NAME.azurecr.io/$IMAGE_NAME:$IMAGE_TAG" --cpu "0.5" --memory "1Gi" --min-executions 0 --max-executions 5 --polling-interval 30 --scale-rule-name "github-runner-rule" --scale-rule-type "github-runner" --scale-rule-metadata "owner=$GITHUB_ORG" "runnerScope=org" "labels=container-app" "targetWorkflowQueueLength=1" --scale-rule-auth "personalAccessToken=github-pat" --secrets "github-pat=keyvaultref:https://$KV_NAME.vault.azure.net/secrets/github-pat,identityref:system" --env-vars "GITHUB_PAT=secretref:github-pat" "GITHUB_OWNER=$GITHUB_ORG" "RUNNER_SCOPE=org" "RUNNER_LABELS=container-app" "RUNNER_GROUP=container-app-runners" --registry-server "$ACR_NAME.azurecr.io" --registry-identity "system"&lt;/LI-CODE&gt;
&lt;H2 data-line="1072"&gt;Appendix B: Repo-Level Runner Changes&lt;/H2&gt;
&lt;P data-line="1074"&gt;If you want runners at the&amp;nbsp;&lt;STRONG&gt;repository level&lt;/STRONG&gt;&amp;nbsp;instead of organization level:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Setting&lt;/th&gt;&lt;th&gt;Org-Level&lt;/th&gt;&lt;th&gt;Repo-Level&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;PAT scope&lt;/td&gt;&lt;td&gt;admin:org&lt;/td&gt;&lt;td&gt;repo&amp;nbsp;only&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RUNNER_SCOPE&amp;nbsp;env var&lt;/td&gt;&lt;td&gt;org&lt;/td&gt;&lt;td&gt;repo&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;GITHUB_REPO&amp;nbsp;env var&lt;/td&gt;&lt;td&gt;Not needed&lt;/td&gt;&lt;td&gt;Required (e.g.,&amp;nbsp;my-repo)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;KEDA&amp;nbsp;runnerScope&lt;/td&gt;&lt;td&gt;org&lt;/td&gt;&lt;td&gt;repo&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;KEDA&amp;nbsp;repos&lt;/td&gt;&lt;td&gt;Optional&lt;/td&gt;&lt;td&gt;Required&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2 data-line="1086"&gt;Summary&lt;/H2&gt;
&lt;P data-line="1088"&gt;Here's what we built:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Component&lt;/th&gt;&lt;th&gt;What It Does&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Dockerfile + start.sh&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Creates an ephemeral GitHub Actions runner image&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;ACR&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Stores the runner image securely&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Key Vault&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Stores the GitHub PAT securely&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Container Apps Environment&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Provides the hosting platform&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Container App Job&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Runs the runner containers on demand&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;KEDA Scale Rule&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Automatically scales runners based on pending GitHub jobs&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Managed Identity&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Connects everything securely without passwords&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3 data-line="1100"&gt;The Result:&lt;/H3&gt;
&lt;P data-line="1102"&gt;✅&amp;nbsp;&lt;STRONG&gt;Zero cost when idle&lt;/STRONG&gt;&amp;nbsp;— no VMs running 24/7 ✅&amp;nbsp;&lt;STRONG&gt;Automatic scaling&lt;/STRONG&gt;&amp;nbsp;— KEDA handles it ✅&amp;nbsp;&lt;STRONG&gt;Ephemeral runners&lt;/STRONG&gt;&amp;nbsp;— clean environment every time ✅&amp;nbsp;&lt;STRONG&gt;Secure&lt;/STRONG&gt;&amp;nbsp;— secrets in Key Vault, Managed Identity for auth ✅&amp;nbsp;&lt;STRONG&gt;No Docker required&lt;/STRONG&gt;&amp;nbsp;— ACR Tasks builds images in the cloud ✅&amp;nbsp;&lt;STRONG&gt;Production ready&lt;/STRONG&gt;&amp;nbsp;— private networking, monitoring, automated image updates&lt;/P&gt;
&lt;P data-line="1102"&gt;-------------------------------------------------------------------------------------------&lt;/P&gt;
&lt;P data-line="1111"&gt;&lt;STRONG&gt;&lt;EM&gt;Have questions or feedback? Drop a comment below!&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P data-line="1113"&gt;&lt;EM&gt;Tags: Azure, Container Apps, GitHub Actions, KEDA, Self-hosted Runners, DevOps, Serverless&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2026 13:03:30 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/running-github-actions-runners-on-azure-container-apps-with-keda/ba-p/4512980</guid>
      <dc:creator>shubhijain</dc:creator>
      <dc:date>2026-05-04T13:03:30Z</dc:date>
    </item>
    <item>
      <title>Modernizing Terraform Pipelines on Azure: OIDC Federation for GitHub Actions and Azure DevOps</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/modernizing-terraform-pipelines-on-azure-oidc-federation-for/ba-p/4516620</link>
      <description>&lt;H3&gt;The secret nobody wants to rotate&lt;/H3&gt;
&lt;P&gt;Most Terraform-on-Azure pipelines we see still authenticate the same way they did three years ago. A long-lived ARM_CLIENT_SECRET sitting in GitHub Actions or Azure DevOps, set once, copied around, and rotated only when something breaks.&lt;/P&gt;
&lt;P&gt;It's the most ignored credential in the cloud, and statistically the most likely one to leak. A developer screenshots a variable group. A pipeline log echoes a value. A fork inherits a secret. Or the secret simply expires on a Friday evening and takes production deployments with it.&lt;/P&gt;
&lt;P&gt;Workload Identity Federation (WIF) makes this whole class of problem go away. The pipeline mints a short-lived token at runtime, exchanges it for an Azure access token via Microsoft Entra, and never touches a secret. GitHub Actions has supported it since 2021. Azure DevOps service connections went GA with WIF in February 2024. The azurerm Terraform provider has supported it since v3.7.&lt;/P&gt;
&lt;P&gt;This post walks through the pattern end-to-end, for both GitHub Actions and Azure DevOps, the way I've rolled it out across multiple customer estates.&lt;/P&gt;
&lt;H3&gt;How the exchange actually works&lt;/H3&gt;
&lt;P&gt;Before any YAML, it helps to picture what's happening:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The CI system (GitHub or ADO) signs a short-lived JWT describing&amp;nbsp;&lt;EM&gt;exactly&lt;/EM&gt; what's running- which repo, which branch, which environment, which service connection.&lt;/LI&gt;
&lt;LI&gt;The pipeline sends that JWT to Microsoft Entra ID.&lt;/LI&gt;
&lt;LI&gt;Entra checks it against a&amp;nbsp;&lt;STRONG&gt;federated identity credential&lt;/STRONG&gt; you've configured on a managed identity or app registration. The iss, sub, and aud claims must match case-sensitively.&lt;/LI&gt;
&lt;LI&gt;If it matches, Entra returns an Azure access token valid for the duration of the job.&lt;/LI&gt;
&lt;LI&gt;Terraform uses it. The job ends. The token expires. Nothing persists.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The token is bound to a specific subject like repo:contoso/platform:environment:prod or sc://contoso/platform/azure-prod. It can't be reused from another repo, branch, or pipeline.&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;Recommended Architecture&lt;/H3&gt;
&lt;P&gt;A few choices that usually hold up in production:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Decision&lt;/th&gt;&lt;th&gt;Choice&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Identity type&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;User-assigned managed identity (UAMI)&lt;/STRONG&gt;, not app registration&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Identity granularity&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;One UAMI per environment&lt;/STRONG&gt;&amp;nbsp;(not per pipeline)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Trust scope&lt;/td&gt;&lt;td&gt;Pinned to the&amp;nbsp;&lt;STRONG&gt;environment&lt;/STRONG&gt;&amp;nbsp;claim, not the branch&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RBAC scope&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;Resource group&lt;/STRONG&gt;, not subscription&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Remote state&lt;/td&gt;&lt;td&gt;OIDC +&amp;nbsp;use_azuread_auth = true, shared key access disabled&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Why UAMIs? They live in your subscription, don't need Application Administrator rights to manage, and follow the lifecycle of the resource group they belong to. Why one per environment? Pipeline-per-identity explodes into hundreds of identities. Environment-per-identity maps cleanly to deployment scopes.&lt;/P&gt;
&lt;H3&gt;Part 1 - GitHub Actions&lt;/H3&gt;
&lt;H4&gt;Step 1: Create the identity and federate it&lt;/H4&gt;
&lt;P&gt;Two commands &lt;STRONG&gt;per environment&lt;/STRONG&gt;. That's it.&lt;/P&gt;
&lt;LI-CODE lang="markdown"&gt;az identity create -g rg-platform-identity -n id-tf-prod -l eastus

az identity federated-credential create \
  --name github-prod \
  --identity-name id-tf-prod \
  --resource-group rg-platform-identity \
  --issuer https://token.actions.githubusercontent.com \
  --subject repo:contoso/platform:environment:prod \
  --audiences api://AzureADTokenExchange&lt;/LI-CODE&gt;
&lt;P&gt;Repeat for nonprod. No secret is created anywhere.&lt;/P&gt;
&lt;H4&gt;Step 2: Wire it up in GitHub&lt;/H4&gt;
&lt;P&gt;In repo&amp;nbsp;&lt;STRONG&gt;Settings → Environments&lt;/STRONG&gt;, create&amp;nbsp;nonprod&amp;nbsp;and&amp;nbsp;prod. On&amp;nbsp;prod, add required reviewers and a branch rule restricting deployments to&amp;nbsp;main. Then add three&amp;nbsp;&lt;STRONG&gt;environment variables&lt;/STRONG&gt; (not secrets - these aren't sensitive): AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID.&lt;/P&gt;
&lt;P&gt;The workflow itself stays small:&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;permissions:
  id-token: write
  contents: read

jobs:
  apply:
    runs-on: ubuntu-latest
    environment: prod
    env:
      ARM_USE_OIDC: "true"
      ARM_CLIENT_ID: ${{ vars.AZURE_CLIENT_ID }}
      ARM_TENANT_ID: ${{ vars.AZURE_TENANT_ID }}
      ARM_SUBSCRIPTION_ID: ${{ vars.AZURE_SUBSCRIPTION_ID }}
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init &amp;amp;&amp;amp; terraform apply -auto-approve&lt;/LI-CODE&gt;
&lt;P&gt;Three things make this secure:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;id-token: write&amp;nbsp;is the only elevated permission, and it doesn't grant write access to anything&amp;nbsp;&lt;EM&gt;in GitHub,&lt;/EM&gt;&amp;nbsp;it just lets the runner mint a JWT.&lt;/LI&gt;
&lt;LI&gt;The&amp;nbsp;environment:&amp;nbsp;line picks the right&amp;nbsp;AZURE_CLIENT_ID&amp;nbsp;&lt;EM&gt;and&lt;/EM&gt;&amp;nbsp;drives the&amp;nbsp;sub&amp;nbsp;claim. The federation refuses anything else.&lt;/LI&gt;
&lt;LI&gt;No azure/login step is needed for Terraform. The azurerm provider reads GitHub's OIDC environment variables automatically.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Part 2 - Azure DevOps&lt;/H3&gt;
&lt;P&gt;The model is identical. The mechanics are different.&lt;/P&gt;
&lt;P&gt;ADO offers two creation paths for a WIF service connection:&amp;nbsp;&lt;STRONG&gt;automatic&lt;/STRONG&gt;&amp;nbsp;(it creates an app registration for you) and&amp;nbsp;&lt;STRONG&gt;manual&lt;/STRONG&gt; (you bring your own UAMI). For platform teams, manual + UAMI is almost always the better choice to ensure identity lives where governance lives.&lt;/P&gt;
&lt;P&gt;The flow is a small dance between the two portals:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;In Azure DevOps, create a new ARM service connection → choose&amp;nbsp;&lt;STRONG&gt;Workload Identity Federation (manual)&lt;/STRONG&gt;&amp;nbsp;→ fill in your UAMI's client ID, tenant ID, and subscription. Save&amp;nbsp;&lt;STRONG&gt;as draft&lt;/STRONG&gt;. ADO shows you an issuer URL and a subject identifier.&lt;/LI&gt;
&lt;LI&gt;In Azure, on the UAMI, add a federated credential using the values ADO showed you. The subject looks like&amp;nbsp;sc://contoso/platform/azure-prod.&lt;/LI&gt;
&lt;LI&gt;Back in ADO, click&amp;nbsp;&lt;STRONG&gt;Verify and save&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;In the pipeline, the service connection only "activates" if a task in the job loads it. The simplest way is the AzureCLI@2 task:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;- task: AzureCLI@2
  inputs:
    azureSubscription: azure-prod   # the WIF service connection
    scriptType: bash
    scriptLocation: inlineScript
    inlineScript: |
      terraform init &amp;amp;&amp;amp; terraform apply -auto-approve
  env:
    ARM_USE_OIDC: "true"
    ARM_CLIENT_ID: $(AZURE_CLIENT_ID)
    ARM_TENANT_ID: $(AZURE_TENANT_ID)
    ARM_SUBSCRIPTION_ID: $(AZURE_SUBSCRIPTION_ID)
    ARM_ADO_PIPELINE_SERVICE_CONNECTION_ID: $(SERVICE_CONNECTION_ID)
    SYSTEM_ACCESSTOKEN: $(System.AccessToken)
    SYSTEM_OIDCREQUESTURI: $(System.OidcRequestUri)&lt;/LI-CODE&gt;
&lt;P&gt;For teams converting dozens of legacy connections, the Azure DevOps team published a&amp;nbsp;&lt;A class="lia-external-url" href="https://devblogs.microsoft.com/devops/workload-identity-federation-for-azure-deployments-is-now-generally-available/" target="_blank" rel="noopener" data-href="https://devblogs.microsoft.com/devops/workload-identity-federation-for-azure-deployments-is-now-generally-available/"&gt;PowerShell helper&lt;/A&gt;&amp;nbsp;that walks every ARM service connection in a project and converts them in place. There's a 7-day rollback window on each connection, which makes the migration genuinely low-risk.&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;Don't forget the state file&lt;/H3&gt;
&lt;P&gt;The Terraform state is your real blast radius. With OIDC, it's almost free to lock it down too. The same UAMI can read and write blob data without the storage account key:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;backend "azurerm" {
  resource_group_name  = "rg-tfstate"
  storage_account_name = "sttfstateprodeastus"
  container_name       = "platform-prod"
  key                  = "platform.tfstate"
  use_oidc             = true
  use_azuread_auth     = true
}&lt;/LI-CODE&gt;
&lt;P&gt;Grant the UAMI&amp;nbsp;Storage Blob Data Contributor&amp;nbsp;on the&amp;nbsp;&lt;STRONG&gt;container&lt;/STRONG&gt;&amp;nbsp;(not the account), disable shared key access on the storage account, and you've removed the last secret in the pipeline.&lt;/P&gt;
&lt;H3&gt;RBAC and break-glass&lt;/H3&gt;
&lt;P&gt;Federation removes a credential, not a privilege. A few habits worth keeping:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Scope role assignments to resource groups&lt;/STRONG&gt;, not subscriptions. The whole point of federation is that scoping is now trivially easy.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Use&amp;nbsp;Role Based Access Control Administrator&lt;/STRONG&gt;&amp;nbsp;instead of&amp;nbsp;User Access Administrator&amp;nbsp;if your Terraform creates role assignments. It's a more recent, narrower role.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Have a documented break-glass.&lt;/STRONG&gt; If GitHub or ADO has a token-service incident, you still need a path to ship a hotfix. A single hardware-key-protected emergency app registration in a separate identity boundary works well, audited monthly.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Monitor sign-ins.&lt;/STRONG&gt; Every federated exchange shows up in Entra sign-in logs as a service principal sign-in. Pipe these to Sentinel and alert on anomalies like sign-ins outside expected hours, or from IPs outside GitHub's published ranges.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;The errors you will hit (and what they really mean)&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Symptom&lt;/th&gt;&lt;th&gt;What it actually is&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;AADSTS70021: No matching federated identity record found&lt;/td&gt;&lt;td&gt;Case-sensitive mismatch in&amp;nbsp;iss,&amp;nbsp;sub, or&amp;nbsp;aud. Almost always a trailing slash or a capitalised character&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AADSTS700016: Application not found in directory&lt;/td&gt;&lt;td&gt;Wrong client ID or tenant. Not a federation problem&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;403 on a resource even though token exchange worked&lt;/td&gt;&lt;td&gt;Federation is fine. Your RBAC isn't. Check the exact scope&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Unable to determine OIDC token&amp;nbsp;(ADO)&lt;/td&gt;&lt;td&gt;No task in the job loaded the service connection. Add an&amp;nbsp;AzureCLI@2&amp;nbsp;step&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Works on&amp;nbsp;main, fails on tags&lt;/td&gt;&lt;td&gt;You pinned&amp;nbsp;sub&amp;nbsp;to a branch ref. Add a second federated credential for tags, or move to environment-based scoping&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;Migrating without a maintenance window&lt;/H3&gt;
&lt;P&gt;You almost never get to do this on a greenfield repo. The order that has worked for me on legacy estates:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Create the new UAMI alongside the old service principal, with the same role assignments.&lt;/LI&gt;
&lt;LI&gt;Federate one canary pipeline. Verify it deploys equivalently.&lt;/LI&gt;
&lt;LI&gt;Cut over pipelines in waves, lowest-risk environment first.&lt;/LI&gt;
&lt;LI&gt;Once a full release cycle passes cleanly, disable the old SP's secret.&lt;/LI&gt;
&lt;LI&gt;Wait another cycle. Then delete the SP entirely.&lt;/LI&gt;
&lt;LI&gt;Add a CI gate that fails any new pipeline introducing ARM_CLIENT_SECRET.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;The old and new auth methods coexist on the same subscription throughout. There's no hard cutover and no maintenance window, just a steady drift toward zero secrets.&lt;/P&gt;
&lt;H3&gt;Wrapping up&lt;/H3&gt;
&lt;P&gt;If you do nothing else after reading this, do one thing: search your CI variable groups for ARM_CLIENT_SECRET. Every result is an outage or a breach waiting to happen.&lt;/P&gt;
&lt;P&gt;Federation is one of those rare changes that's both more secure&amp;nbsp;&lt;EM&gt;and&lt;/EM&gt;&amp;nbsp;less work to operate. Once you've set it up, you stop thinking about credential rotation, secret expiry, and quarterly access reviews for service principals. The pipeline simply runs, and the audit trail is in Entra where it belongs.&lt;/P&gt;
&lt;P&gt;That's a good trade.&lt;/P&gt;</description>
      <pubDate>Sat, 02 May 2026 20:34:19 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/modernizing-terraform-pipelines-on-azure-oidc-federation-for/ba-p/4516620</guid>
      <dc:creator>ssinghkalra</dc:creator>
      <dc:date>2026-05-02T20:34:19Z</dc:date>
    </item>
    <item>
      <title>Automating Azure Naming Standards using API and DevOps Pipelines</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/automating-azure-naming-standards-using-api-and-devops-pipelines/ba-p/4516628</link>
      <description>&lt;H2&gt;Introduction&lt;/H2&gt;
&lt;P&gt;In large Azure environments, one of the most overlooked yet critical governance challenges is &lt;STRONG&gt;resource naming consistency&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;While organizations define naming standards, enforcing them at scale across multiple subscriptions, teams, and pipelines often becomes a manual and inconsistent process.&lt;/P&gt;
&lt;P&gt;In real-world projects, this leads to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Operational confusion&lt;/LI&gt;
&lt;LI&gt;Difficult resource identification&lt;/LI&gt;
&lt;LI&gt;Reduced traceability&lt;/LI&gt;
&lt;LI&gt;Governance gaps&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;To address this, we implemented an &lt;STRONG&gt;API-driven naming validation approach integrated with Azure DevOps pipelines&lt;/STRONG&gt;, ensuring every resource created follows organizational standards automatically.&lt;/P&gt;
&lt;H2&gt;The Problem: Inconsistent Naming Across Environments&lt;/H2&gt;
&lt;P&gt;In distributed teams and large-scale environments, naming issues commonly arise due to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Multiple developers creating resources independently&lt;/LI&gt;
&lt;LI&gt;Lack of centralized enforcement&lt;/LI&gt;
&lt;LI&gt;Manual validation during deployments&lt;/LI&gt;
&lt;LI&gt;No integration with CI/CD pipeline&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Example (Before Automation)&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Resource Type&lt;/th&gt;&lt;th&gt;Example Name&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Resource Group&lt;/td&gt;&lt;td&gt;testRG1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Storage Account&lt;/td&gt;&lt;td&gt;mystorage123&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;VM&lt;/td&gt;&lt;td&gt;vm-prod&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Problems:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No standard structure&lt;/LI&gt;
&lt;LI&gt;No environment or region context&lt;/LI&gt;
&lt;LI&gt;Hard to manage at scale&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Goal&lt;/H2&gt;
&lt;P&gt;To ensure:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;✅ Standardized naming across all resources&lt;/LI&gt;
&lt;LI&gt;✅ Automated validation during deployments&lt;/LI&gt;
&lt;LI&gt;✅ No manual intervention required&lt;/LI&gt;
&lt;LI&gt;✅ Seamless integration with DevOps workflows&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Solution Overview&lt;/H2&gt;
&lt;P&gt;We implemented a &lt;STRONG&gt;naming enforcement mechanism using:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Naming Tool (or similar API-based naming service)&lt;/LI&gt;
&lt;LI&gt;Azure DevOps Pipelines&lt;/LI&gt;
&lt;LI&gt;Managed Identity for secure authentication&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Architecture Flow&lt;/H2&gt;
&lt;img&gt;&lt;SPAN style="color: rgb(112, 112, 112);" data-mce-style="color: rgb(112, 112, 112);"&gt;Automated &lt;/SPAN&gt;&lt;SPAN style="color: rgb(112, 112, 112);" data-mce-style="color: rgb(112, 112, 112);"&gt;Naming Validation using Naming API, Managed Identity, and DevOps Pipeline&lt;/SPAN&gt;&lt;/img&gt;
&lt;H3&gt;🔍 Solution Flow Explained&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Developer Commit&lt;/STRONG&gt;&lt;BR /&gt;The process begins when a developer commits code to the repository, triggering the Azure DevOps pipeline.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure DevOps Pipeline Execution&lt;/STRONG&gt;&lt;BR /&gt;The pipeline runs deployment scripts as part of the CI/CD process.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Managed Identity Authentication&lt;/STRONG&gt;&lt;BR /&gt;The pipeline uses Managed Identity to securely authenticate and obtain an access token—eliminating the need for storing credentials.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Naming API Invocation&lt;/STRONG&gt;&lt;BR /&gt;A request is sent to the Naming API with resource details such as:
&lt;UL&gt;
&lt;LI&gt;Resource type&lt;/LI&gt;
&lt;LI&gt;Environment&lt;/LI&gt;
&lt;LI&gt;Location&lt;/LI&gt;
&lt;LI&gt;Application name&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Validation &amp;amp; Name Generation&lt;/STRONG&gt;&lt;BR /&gt;The Naming API validates inputs and returns a compliant resource name based on predefined standards.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Deployment Decision&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;If validation succeeds → resources are deployed&lt;/LI&gt;
&lt;LI&gt;If validation fails → deployment is blocked&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Resource Deployment&lt;/STRONG&gt;&lt;BR /&gt;Only validated, compliant resources are provisioned in Azure.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Note:&lt;/STRONG&gt;&lt;BR /&gt;The “Azure Naming API” referenced in this blog represents an implementation pattern rather than a native Azure service.&lt;BR /&gt;Solutions such as Azure Naming Tool or custom APIs can be used to expose naming logic and integrate with DevOps pipelines for automated enforcement. This approach can be implemented using solutions like Azure Naming Tool, Resource Name Generator, or custom-built APIs&lt;/P&gt;
&lt;H2&gt;Implementation Details&lt;/H2&gt;
&lt;H4&gt;Authentication using Managed Identity&lt;/H4&gt;
&lt;P&gt;To securely access the Naming API:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Managed Identity is used&lt;/LI&gt;
&lt;LI&gt;No secrets or credentials stored in pipeline&lt;/LI&gt;
&lt;LI&gt;Token retrieved dynamically&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;PowerShell Implementation&lt;/H3&gt;
&lt;P&gt;Below is a simplified version of what was used in implementation:&lt;/P&gt;
&lt;LI-CODE lang="powershell"&gt;# Get access token using Managed Identity
$token = (Get-AzAccessToken -ResourceUrl "api://NamingTool").Token

# Call naming API
$response = Invoke-RestMethod `
    -Uri "https://your-namingtool-api-endpoint/api/naming" `
    -Headers @{ Authorization = "Bearer $token" } `
    -Method POST `
    -Body @{
        resourceType = "resourceGroup"
        environment  = "prod"
        location     = "eastus"
        application  = "app01"
    } | ConvertTo-Json

# Extract generated resource name
$resourceName = $response.name

Write-Output "Generated Name: $resourceName"
&lt;/LI-CODE&gt;
&lt;H2&gt;🔄 Azure DevOps Pipeline Integration&lt;/H2&gt;
&lt;P&gt;Naming validation is integrated directly into the deployment pipeline. Sample Pipeline Snippet&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;- task: AzureCLI@2
  inputs:
    azureSubscription: 'ServiceConnection'
    scriptType: 'ps'
    scriptLocation: 'inlineScript'
    inlineScript: |
      Write-Output "Calling Naming API"
      .\scripts\Get-ResourceName.ps1&lt;/LI-CODE&gt;
&lt;H3&gt;Key Benefit:&lt;/H3&gt;
&lt;P&gt;👉 Resource names are validated &lt;STRONG&gt;before deployment&lt;/STRONG&gt;, preventing non-compliant resources from being created.&lt;/P&gt;
&lt;H2&gt;Security Considerations&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Use Managed Identity for API authentication&lt;/LI&gt;
&lt;LI&gt;Avoid storing secrets in pipelines&lt;/LI&gt;
&lt;LI&gt;Ensure API access is restricted and secured&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Extending This Solution&lt;/H2&gt;
&lt;P&gt;This approach can be extended to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Enforcing tagging standards&lt;/LI&gt;
&lt;LI&gt;Policy validation before deployment&lt;/LI&gt;
&lt;LI&gt;Subscription vending automation&lt;/LI&gt;
&lt;LI&gt;Cost governance controls&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;✨ Final Thoughts&lt;/H2&gt;
&lt;P&gt;Naming standards are often documented—but rarely enforced effectively.&lt;/P&gt;
&lt;P&gt;By integrating API-based naming validation into DevOps pipelines, organizations can move from:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Guidelines → ✅ Automated Enforcement&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;This ensures governance is:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Scalable&lt;/LI&gt;
&lt;LI&gt;Consistent&lt;/LI&gt;
&lt;LI&gt;Developer-friendly&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 05 May 2026 09:43:42 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/automating-azure-naming-standards-using-api-and-devops-pipelines/ba-p/4516628</guid>
      <dc:creator>sameenamohammed</dc:creator>
      <dc:date>2026-05-05T09:43:42Z</dc:date>
    </item>
    <item>
      <title>Reimagining Azure Governance with Automation &amp; EPAC</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/reimagining-azure-governance-with-automation-epac/ba-p/4516626</link>
      <description>&lt;H3&gt;🧩 The Challenge: Governance at Scale&lt;/H3&gt;
&lt;P&gt;Managing Azure environments manually introduces:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;❌ Policy drift across subscriptions&lt;/LI&gt;
&lt;LI&gt;❌ Inconsistent naming conventions&lt;/LI&gt;
&lt;LI&gt;❌ Delays in compliance enforcement&lt;/LI&gt;
&lt;LI&gt;❌ Human errors in deployments&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;🔍 &lt;STRONG&gt;Insight:&lt;/STRONG&gt; Governance gaps are often not due to lack of policies—but lack of automation.&lt;/P&gt;
&lt;H3&gt;Solution Overview: EPAC&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Enterprise Policy as Code (EPAC)&lt;/STRONG&gt; to bring governance into the DevOps workflow.&lt;/P&gt;
&lt;P&gt;EPAC helped us:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Treat policies like code&lt;/LI&gt;
&lt;LI&gt;Automate deployments&lt;/LI&gt;
&lt;LI&gt;Standardize governance&lt;/LI&gt;
&lt;LI&gt;Maintain audit history&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Architecture Overview&lt;/H3&gt;
&lt;P&gt;Our EPAC setup included:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Management Groups hierarchy&lt;/STRONG&gt; for governance scope&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Policy Definitions &amp;amp; Initiatives&lt;/STRONG&gt; (JSON/Bicep)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure DevOps Pipeline&lt;/STRONG&gt; for deployment&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Managed Identity&lt;/STRONG&gt; for secure execution&lt;/LI&gt;
&lt;/UL&gt;
&lt;img&gt;This is sample implementation flow of EPAC in enterprises&lt;/img&gt;
&lt;H3&gt;Flow Explanation&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure DevOps Pipeline&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;Triggers CI/CD process&lt;/LI&gt;
&lt;LI&gt;Executes deployment scripts&lt;/LI&gt;
&lt;LI&gt;Authenticates securely using Managed Identity&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;EPAC Framework&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;Stores policies as code&lt;/LI&gt;
&lt;LI&gt;Enables version control and validation&lt;/LI&gt;
&lt;LI&gt;Acts as the central governance engine&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Azure Policy Engine&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;Evaluates resources against defined policies&lt;/LI&gt;
&lt;LI&gt;Enforces compliance automatically&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Target Environments&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;Policies applied across:
&lt;UL&gt;
&lt;LI&gt;Management Groups&lt;/LI&gt;
&lt;LI&gt;Subscriptions&lt;/LI&gt;
&lt;LI&gt;Resource Groups&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;Why Policy-as-Code Matters&lt;/H3&gt;
&lt;P&gt;✅ Standardized governance across environments&lt;BR /&gt;✅ Faster onboarding of subscriptions&lt;BR /&gt;✅ Improved audit readiness&lt;BR /&gt;✅ Repeatable and reliable policy deployment&lt;BR /&gt;✅ Seamless DevOps integration&lt;/P&gt;
&lt;H3&gt;Security &amp;amp; Compliance Benefits&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Enforces &lt;STRONG&gt;least privilege access&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Prevents &lt;STRONG&gt;misconfigured deployments&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Supports &lt;STRONG&gt;continuous compliance&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Aligns with standards like &lt;STRONG&gt;ISO 27001&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Implementation Walkthrough&lt;/H3&gt;
&lt;H4&gt;✅ Step 1: Structuring Policy Repository&lt;/H4&gt;
&lt;img /&gt;
&lt;P&gt;This structure ensured:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Clear separation of concerns&lt;/LI&gt;
&lt;LI&gt;Reusability&lt;/LI&gt;
&lt;LI&gt;Easy onboarding for new contributors&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;✅ Step 2: Defining Policies&lt;/H4&gt;
&lt;P&gt;We created custom policies and reused built-in ones.&lt;/P&gt;
&lt;P&gt;Example: Restrict allowed regions&lt;/P&gt;
&lt;P&gt;{&lt;/P&gt;
&lt;P&gt;&amp;nbsp; "if": {&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "field": "location",&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "notIn": ["eastus", "westeurope"]&lt;/P&gt;
&lt;P&gt;&amp;nbsp; },&lt;/P&gt;
&lt;P&gt;&amp;nbsp; "then": {&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "effect": "deny"&lt;/P&gt;
&lt;P&gt;&amp;nbsp; }&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;H4&gt;✅ Step 3: Creating Initiatives&lt;/H4&gt;
&lt;P&gt;Instead of assigning individual policies, we grouped them into &lt;STRONG&gt;initiatives&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Security baseline&lt;/LI&gt;
&lt;LI&gt;Tagging compliance&lt;/LI&gt;
&lt;LI&gt;Cost optimization&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;👉 This reduced duplication and simplified assignments.&lt;/P&gt;
&lt;H4&gt;✅ Step 4: Pipeline Automation&lt;/H4&gt;
&lt;P&gt;We built an Azure DevOps pipeline to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Validate policy templates&lt;/LI&gt;
&lt;LI&gt;Deploy definitions to management groups&lt;/LI&gt;
&lt;LI&gt;Assign initiatives automatically&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Example pipeline flow:&lt;/P&gt;
&lt;P&gt;This is example pipeline structure for deploying epac policies targeted to single environment&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;parameters:
  - name: forceDeployment
    displayName: 'Force deployment (ignore change detection)'
    type: boolean
    default: false
  - name: clearAgentCache
    displayName: 'Clear agent container cache (recommended for troubleshooting)'
    type: boolean
    default: true

variables:
  # Pipeline per il deployment delle Policy in ambiente Canary/Test
  PAC_OUTPUT_FOLDER: ./Output
  PAC_DEFINITIONS_FOLDER: ./Definitions

  # Service connection per l'ambiente di test/canary
  serviceConnection: "SC-EPAC-CONTRIBUTOR-TST-001"

  # Environment selector per canary
  pacEnvironmentSelector: canary

# Trigger: deploy solo manualmente o da branch specifici
trigger: none

# PR trigger per validazione
pr:
  branches:
    include:
    - main
    - feature/*
  paths:
    include:
    - src/IaC/Infrastructure/epac/Definitions/*

pool:
  name: "TST-AgentPool-01"

stages:
  - stage: Plan
    displayName: "Plan Canary Environment"
    jobs:
      - job: Plan
        displayName: "Generate Deployment Plan"
        steps:
          - template: templates/plan.yml
            parameters:
              serviceConnection: $(serviceConnection)
              pacEnvironmentSelector: ${{ variables.pacEnvironmentSelector }}

  - stage: Deploy
    displayName: "Deploy to Canary (Audit Mode)"
    dependsOn: Plan
    condition: and(not(failed()), not(canceled()), or(eq('${{ parameters.forceDeployment }}', 'true'), and(eq('${{ parameters.forceDeployment }}', 'false'), or(eq(dependencies.Plan.outputs['Plan.Plan.deployPolicyChanges'], 'yes'), eq(dependencies.Plan.outputs['Plan.Plan.deployRoleChanges'], 'yes')))))
    variables:
      PAC_INPUT_FOLDER: "$(Pipeline.Workspace)/plans-${{ variables.pacEnvironmentSelector }}"
      localDeployPolicyChanges: $[stageDependencies.Plan.Plan.outputs['Plan.deployPolicyChanges']]
      localDeployRoleChanges: $[stageDependencies.Plan.Plan.outputs['Plan.deployRoleChanges']]
    jobs:
      - deployment: DeployPolicy
        displayName: "Deploy Policy Changes (Audit Mode)"
        environment: PAC-CANARY
        condition: and(not(failed()), not(canceled()), or(eq('${{ parameters.forceDeployment }}', 'true'), and(eq('${{ parameters.forceDeployment }}', 'false'), eq(variables.localDeployPolicyChanges, 'yes'))))
        strategy:
          runOnce:
            deploy:
              steps:
                - template: templates/deploy-policy.yml
                  parameters:
                    serviceConnection: $(serviceConnection)
                    pacEnvironmentSelector: ${{ variables.pacEnvironmentSelector }}
                    forceDeployment: ${{ parameters.forceDeployment }}
                
      - deployment: DeployRoles
        displayName: "Deploy Role Assignments"
        dependsOn: DeployPolicy
        environment: PAC-CANARY
        condition: and(not(failed()), not(canceled()), eq(variables.localDeployRoleChanges, 'yes'))  # Riabilitato per AMBA managed identity
        strategy:
          runOnce:
            deploy:
              steps:
                - template: templates/deploy-roles.yml
                  parameters:
                    serviceConnection: $(serviceConnection)
                    pacEnvironmentSelector: ${{ variables.pacEnvironmentSelector }}

  # Stage opzionale per validazione post-deployment
  - stage: Validate
    displayName: "Validate Canary Deployment"
    dependsOn: Deploy
    condition: and(succeeded(), ne(variables['Build.Reason'], 'PullRequest'))
    jobs:
      - job: ValidateCompliance
        displayName: "Validate Policy Compliance"
        steps:
          - task: PowerShell@2
            displayName: "Check Policy Compliance Status"
            inputs:
              targetType: 'inline'
              script: |
                Write-Host "##[section]Canary deployment completed successfully"
                Write-Host "##[warning]Remember: All policies are in AUDIT mode - monitor compliance dashboard"
                Write-Host "##[task.complete result=Succeeded;]Canary validation completed"&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;✅ Step 5: Secure Deployment using Managed Identity&lt;/H4&gt;
&lt;P&gt;We used:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;System-assigned Managed Identity&lt;/LI&gt;
&lt;LI&gt;RBAC roles (Policy Contributor / Reader)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;✅ Benefit:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No secrets in pipeline&lt;/LI&gt;
&lt;LI&gt;Improved security posture&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;✅ Step 6: Policy Assignment at Scale&lt;/H4&gt;
&lt;P&gt;Policies were assigned at:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Root Management Group&lt;/LI&gt;
&lt;LI&gt;Subscription level (when needed)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This ensured:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Consistent enforcement&lt;/LI&gt;
&lt;LI&gt;Centralized control&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Real Use Cases Implemented&lt;/H3&gt;
&lt;P&gt;Using EPAC, we solved real scenarios:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;🔹 Enforcing &lt;STRONG&gt;naming conventions&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;🔹 Ensuring mandatory &lt;STRONG&gt;resource tagging&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;🔹 Restricting deployment regions&lt;/LI&gt;
&lt;LI&gt;🔹 Enforcing &lt;STRONG&gt;backup policies on disks&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;🔹 Preventing creation of non-compliant resources&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Best Practices&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Start with baseline policies first&lt;/LI&gt;
&lt;LI&gt;Use initiatives instead of individual assignments&lt;/LI&gt;
&lt;LI&gt;Enable PR-based approvals&lt;/LI&gt;
&lt;LI&gt;Always test policies in lower environments&lt;/LI&gt;
&lt;LI&gt;Maintain clear documentation&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Conclusion&lt;/H3&gt;
&lt;P&gt;Implementing EPAC transformed our governance model from &lt;STRONG&gt;manual and reactive → automated and proactive&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;For teams managing complex Azure environments, EPAC provides:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Scalability&lt;/LI&gt;
&lt;LI&gt;Consistency&lt;/LI&gt;
&lt;LI&gt;Security&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If you are still managing policies manually, this is the right time to: 👉 Move to &lt;STRONG&gt;Policy as Code&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 02 May 2026 19:26:11 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/reimagining-azure-governance-with-automation-epac/ba-p/4516626</guid>
      <dc:creator>sameenamohammed</dc:creator>
      <dc:date>2026-05-02T19:26:11Z</dc:date>
    </item>
    <item>
      <title>AI‑Accelerated AVM Refactoring: Modernizing Legacy IaC Safely and Swiftly</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/ai-accelerated-avm-refactoring-modernizing-legacy-iac-safely-and/ba-p/4516437</link>
      <description>&lt;H2&gt;Why AVM refactoring is harder in brownfield&lt;/H2&gt;
&lt;P&gt;Greenfield AVM adoption is straightforward: pick the module, deploy, and iterate. Brownfield is different. You’re refactoring *live* infrastructure — where state, naming conventions, diagnostic settings, and policy baselines already exist.&lt;/P&gt;
&lt;P&gt;In one anonymized engagement, an AI‑assisted audit found that **43% of module invocations (~10 modules) ** still relied on legacy, non‑AVM wrappers — even though modernization work was already underway.&lt;/P&gt;
&lt;P&gt;That number wasn’t just a KPI. It became the roadmap: which modules to prioritize, where inconsistency risk was highest, and which components needed deeper plan review.&lt;/P&gt;
&lt;H2&gt;The hidden risks you only feel during refactor&lt;/H2&gt;
&lt;P&gt;AVM adoption often introduces more than a module source change. It can also bring:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;New naming patterns (especially around extensions like diagnostics)&lt;/LI&gt;
&lt;LI&gt;More structured configuration objects (maps/objects replacing flat inputs)&lt;/LI&gt;
&lt;LI&gt;Additional ‘helper’ resources (RBAC, wait timers, diagnostics wiring)&lt;/LI&gt;
&lt;LI&gt;Policy‑aligned defaults (encryption, public access disabled, logging enabled)&lt;/LI&gt;
&lt;LI&gt;Provider/version constraint pressure (to meet AVM expectations)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;None of these are bad — in fact, they’re usually improvements. But in a live estate, each change must be checked through one lens: **state safety**.&lt;/P&gt;
&lt;H2&gt;Where AI actually helps (and where it doesn’t)&lt;/H2&gt;
&lt;P&gt;AI is most valuable when it reduces *mechanical effort* — the repetitive work that slows teams down — while humans keep ownership of architecture and risk.&lt;/P&gt;
&lt;H3&gt;1) Automated codebase audit (the fastest win)&lt;/H3&gt;
&lt;P&gt;AI can scan a repo and produce an inventory of module usage, versions, and adoption status: direct AVM, AVM‑wrapped, legacy wrappers, and native resources. This turns hours of manual inspection into a structured baseline report.&lt;/P&gt;
&lt;H3&gt;2) Draft refactors scaffolding (with mandatory human review)&lt;/H3&gt;
&lt;P&gt;Tools like GitHub Copilot can generate first‑pass Terraform refactors — reshaping legacy calls into AVM‑style interfaces and scaffolding optional blocks (diagnostics, identities, RBAC).&lt;/P&gt;
&lt;P&gt;But AI output should be treated as *a proposal*, not a truth. The most dangerous failures aren’t syntax errors — they’re subtle mismatches: a parameter mapped to the wrong field, a default that changes behavior, or an omitted lifecycle constraint.&lt;/P&gt;
&lt;H3&gt;3) Terraform plan diff interpretation (explain what changed)&lt;/H3&gt;
&lt;P&gt;Refactoring to AVM can expand plan output. AI can help summarize large diffs into:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Benign additions (telemetry wiring, diagnostics scaffolding, RBAC helpers)&lt;/LI&gt;
&lt;LI&gt;Behavioral changes requiring sign‑off (network exposure, encryption posture, identity model)&lt;/LI&gt;
&lt;LI&gt;High‑risk actions (destroy/recreate) and the exact resource addresses involved&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This doesn’t replace plan review — it accelerates understanding so reviewers can focus on what truly matters.&lt;/P&gt;
&lt;H3&gt;4) Policy violation translation (from red to ready)&lt;/H3&gt;
&lt;P&gt;When policy gates fail (Azure Policy, Checkov, etc.), AI is great at translating requirements into actionable remediation — and checking whether AVM supports it natively or needs supplementary configuration.&lt;/P&gt;
&lt;H3&gt;5) Repository hygiene enforcement (structure as a hard gate)&lt;/H3&gt;
&lt;P&gt;In multi‑team repos, drift happens: ad‑hoc scripts, local module copies, inconsistent folder patterns. AI can continuously scan for these anti‑patterns and flag deviations early — before they become ‘how we do it now’.&lt;/P&gt;
&lt;H3&gt;6) Specification‑driven development (the future‑proof approach)&lt;/H3&gt;
&lt;P&gt;Microsoft’s AVM guidance now explicitly discusses **AI‑assisted IaC solution development** — pairing AVM modules with AI tools to speed delivery while keeping humans in control. In parallel, approaches like Spec Kit promote structured, specification‑driven workflows so requirements and constraints remain the source of truth.&lt;/P&gt;
&lt;H2&gt;The operating model that keeps you safe&lt;/H2&gt;
&lt;P&gt;Here’s the simplest rule I’ve seen work consistently:&lt;/P&gt;
&lt;P&gt;**AI drafts. Humans validate. The Terraform plan decides.**&lt;/P&gt;
&lt;P&gt;That operating model prevents two extremes: (1) refusing AI because it’s imperfect, and (2) trusting AI output blindly because it sounds confident.&lt;/P&gt;
&lt;H2&gt;A practical AI‑accelerated refactor playbook&lt;/H2&gt;
&lt;P&gt;If you want a repeatable approach that scales across environments, here’s a playbook that balances speed and safety:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Baseline audit: inventory module sources and adoption categories.&lt;/LI&gt;
&lt;LI&gt;Equivalence check: identify AVM‑ready modules vs AVM gaps.&lt;/LI&gt;
&lt;LI&gt;Slice the work: refactor one bounded component first.&lt;/LI&gt;
&lt;LI&gt;Use AI for scaffolding: generate draft code and a migration checklist.&lt;/LI&gt;
&lt;LI&gt;Plan review discipline: categorize additions vs updates vs replacements.&lt;/LI&gt;
&lt;LI&gt;Import decision framework: import where state safety matters; accept in‑place updates only when semantics are unchanged.&lt;/LI&gt;
&lt;LI&gt;Governance gates: enforce structure + policy + plan review before merge.&lt;/LI&gt;
&lt;LI&gt;Iterate: expect multiple cycles — AI should compress cycles, not eliminate them.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2&gt;Don’t couple refactoring with compliance&lt;/H2&gt;
&lt;P&gt;One more lesson worth calling out: **AVM adoption and compliance are related, but not identical.**&lt;/P&gt;
&lt;P&gt;Treat policy enforcement as a continuous pipeline requirement from Dev through Prod — independent of whether a component is fully AVM‑aligned. This avoids scope creep (“refactor means fix everything”) while still driving the estate toward a no‑surprises posture.&lt;/P&gt;
&lt;H2&gt;What ‘success’ looks like&lt;/H2&gt;
&lt;P&gt;A successful AI‑accelerated AVM refactor typically delivers:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Lower variance between environments&lt;/LI&gt;
&lt;LI&gt;Fewer one‑off wrappers and exceptions&lt;/LI&gt;
&lt;LI&gt;Stronger defaults aligned to policy&lt;/LI&gt;
&lt;LI&gt;A smaller drift surface area&lt;/LI&gt;
&lt;LI&gt;Faster, safer change velocity&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;And the best part? It changes the mindset from ‘IaC as scripts’ to **IaC as a governed product** — with standards that hold up in audits and operations.&lt;/P&gt;
&lt;H2&gt;Closing thought&lt;/H2&gt;
&lt;P&gt;AI won’t replace your architectural accountability — and it shouldn’t try to. But it *can* remove the friction that makes refactoring feel impossible.&lt;/P&gt;
&lt;P&gt;If you’re sitting on a pile of legacy wrapper modules today, consider this: the safest time to modernize is **before** the next urgent change lands in your backlog.&lt;/P&gt;
&lt;H2&gt;References (public)&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Verified Modules (AVM): https://azure.github.io/Azure-Verified-Modules/&lt;/LI&gt;
&lt;LI&gt;AI‑Assisted IaC Solution Development (AVM): https://azure.github.io/Azure-Verified-Modules/experimental/ai-assisted-sol-dev/&lt;/LI&gt;
&lt;LI&gt;Spec Kit (AVM): https://azure.github.io/Azure-Verified-Modules/experimental/ai-assisted-sol-dev/spec-kit/&lt;/LI&gt;
&lt;LI&gt;AVM Telemetry guidance: https://azure.github.io/Azure-Verified-Modules/help-support/telemetry/&lt;/LI&gt;
&lt;LI&gt;Microsoft Learn – Azure Verified Modules overview: https://learn.microsoft.com/en-us/community/content/azure-verified-modules&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 01 May 2026 12:14:29 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/ai-accelerated-avm-refactoring-modernizing-legacy-iac-safely-and/ba-p/4516437</guid>
      <dc:creator>HimanshuYadav</dc:creator>
      <dc:date>2026-05-01T12:14:29Z</dc:date>
    </item>
    <item>
      <title>🚀 From Drift to Diagnosis: AI‑Powered Root Cause Analysis for Azure Infrastructure</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/from-drift-to-diagnosis-ai-powered-root-cause-analysis-for-azure/ba-p/4515436</link>
      <description>&lt;H2&gt;🌍 A Real-World Scenario&lt;/H2&gt;
&lt;P&gt;During a recent production deployment of an enterprise AI platform, everything looked perfectly aligned from an infrastructure perspective:&lt;/P&gt;
&lt;P&gt;✅ Infrastructure deployed via IaC (Terraform)&lt;BR /&gt;✅ Private endpoints enforced&lt;BR /&gt;✅ Public access disabled for all AI services&lt;/P&gt;
&lt;P&gt;A few hours later, an alert triggered.&lt;/P&gt;
&lt;P&gt;❗ The Azure OpenAI endpoint was publicly accessible.&lt;/P&gt;
&lt;P&gt;This was unexpected — and risky.&lt;/P&gt;
&lt;H3&gt;🔍 What the team did next&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Ran terraform plan → ✅ Drift detected&lt;/LI&gt;
&lt;LI&gt;Checked Azure Portal → ✅ Configuration mismatch confirmed&lt;/LI&gt;
&lt;LI&gt;Reviewed activity logs → ❓ Multiple changes found, but unclear ownership&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;🚫 The problem&lt;/H3&gt;
&lt;P&gt;Drift detection tools clearly showed:&lt;/P&gt;
&lt;P&gt;“Configuration mismatch”&lt;/P&gt;
&lt;P&gt;But they did NOT answer:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Why was it changed?&lt;/LI&gt;
&lt;LI&gt;Who made the change?&lt;/LI&gt;
&lt;LI&gt;Was this intentional or accidental?&lt;/LI&gt;
&lt;LI&gt;What is the impact?&lt;/LI&gt;
&lt;LI&gt;What should be done next?&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;👉 It took &lt;STRONG&gt;hours of manual investigation&lt;/STRONG&gt; to produce a root cause analysis.&lt;/P&gt;
&lt;H2&gt;💡 The Shift: From Detection to Diagnosis&lt;/H2&gt;
&lt;P&gt;Most tools today stop at detection.&lt;/P&gt;
&lt;P&gt;But what teams really need is:&lt;/P&gt;
&lt;P&gt;✅ A system that explains &lt;STRONG&gt;why drift happened&lt;/STRONG&gt; and &lt;STRONG&gt;what to do next&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;This is where &lt;STRONG&gt;AI-powered drift analysis&lt;/STRONG&gt; becomes powerful.&lt;/P&gt;
&lt;H2&gt;🏗️ Architecture Overview&lt;/H2&gt;
&lt;P&gt;Below is a simple architecture that combines Azure data sources with AI to generate &lt;STRONG&gt;human-readable RCA reports&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H2&gt;🔧 How It Works (Step-by-Step)&lt;/H2&gt;
&lt;H3&gt;✅ Step 1 — Detect Drift&lt;/H3&gt;
&lt;P&gt;Using standard IaC tools:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Terraform&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 80.0926%; border-width: 1px;"&gt;&lt;colgroup&gt;&lt;col style="width: 100%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&amp;nbsp; terraform plan&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Bicep&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 80.3704%; height: 100px; border-width: 1px;"&gt;&lt;colgroup&gt;&lt;col style="width: 100%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;az deployment group what-if &amp;lt;/span&amp;gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;--resource-group rg-ai &amp;lt;/span&amp;gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;--template-file main.bicep&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;✅ Step 2 — Capture Actual State&lt;/H3&gt;
&lt;P&gt;Query Azure using Resource Graph:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 81.2037%; height: 38px; border-width: 1px;"&gt;&lt;colgroup&gt;&lt;col style="width: 100%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Resources&lt;/P&gt;
&lt;P&gt;&amp;nbsp;| &lt;SPAN class="lia-text-color-10"&gt;project &lt;SPAN class="lia-text-color-21"&gt;id&lt;/SPAN&gt;&lt;/SPAN&gt;, name, &lt;SPAN class="lia-text-color-10"&gt;type&lt;/SPAN&gt;, location, properties&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;✅ Step 3 — Add Context (Critical Step)&lt;/H3&gt;
&lt;P&gt;Drift without context is incomplete.&lt;/P&gt;
&lt;P&gt;Use Activity Logs:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 81.4815%; height: 107px; border-width: 1px;"&gt;&lt;colgroup&gt;&lt;col style="width: 100%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;AzureActivity&lt;/P&gt;
&lt;P&gt;&amp;nbsp;| &lt;SPAN class="lia-text-color-10"&gt;where &lt;/SPAN&gt;TimeGenerated &amp;gt; &lt;SPAN class="lia-text-color-7"&gt;ago&lt;/SPAN&gt;(&lt;SPAN class="lia-text-color-11"&gt;24h&lt;/SPAN&gt;)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;| &lt;SPAN class="lia-text-color-10"&gt;project &lt;/SPAN&gt;TimeGenerated, ResourceId, OperationName, Caller&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;👉 This gives:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Who made the change&lt;/LI&gt;
&lt;LI&gt;What operation was executed&lt;/LI&gt;
&lt;LI&gt;When it happened&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;✅ Step 4 — AI-Powered RCA&lt;/H3&gt;
&lt;P&gt;Instead of analyzing raw JSON manually, pass the structured data to an AI model.&lt;/P&gt;
&lt;H3&gt;📥 Input to AI&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 82.2222%; height: 539px; border-width: 1px;"&gt;&lt;colgroup&gt;&lt;col style="width: 100%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;{&lt;/P&gt;
&lt;P&gt;&amp;nbsp; "resource": "openai-endpoint-prod",&lt;/P&gt;
&lt;P&gt;&amp;nbsp; "expected": {&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; "publicNetworkAccess": "Disabled"&lt;/P&gt;
&lt;P&gt;&amp;nbsp; },&lt;/P&gt;
&lt;P&gt;&amp;nbsp; "actual": {&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; "publicNetworkAccess": "Enabled"&lt;/P&gt;
&lt;P&gt;&amp;nbsp; },&lt;/P&gt;
&lt;P&gt;&amp;nbsp; "activityLog": {&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; "caller": "admin@company.com",&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; "operation": "write",&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; "time": "2026-04-28T10:15:00Z"&lt;/P&gt;
&lt;P&gt;&amp;nbsp; }&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;🤖 AI Output (Human-Readable RCA)&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 82.6852%; height: 444px; border-width: 1px;"&gt;&lt;colgroup&gt;&lt;col style="width: 100%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 34.8px;"&gt;&lt;td style="height: 34.8px;"&gt;&amp;nbsp;
&lt;P&gt;&lt;STRONG&gt;Drift Summary:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The OpenAI endpoint has public access enabled, which deviates from the expected secure configuration.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Root Cause:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;A manual configuration change was performed by admin@company.com via Azure Portal.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Impact:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;- Increased exposure to public internet&lt;/P&gt;
&lt;P&gt;- Potential violation of security baseline&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Recommended Action:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;- Revert configuration using IaC deployment&lt;/P&gt;
&lt;P&gt;- Apply Azure Policy to enforce private access&lt;/P&gt;
&lt;P&gt;- Restrict access using RBAC/PIM&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;👉 This replaces manual debugging with &lt;STRONG&gt;instant diagnosis&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;📊 Drift Digest (Operational View)&lt;/H2&gt;
&lt;P&gt;Instead of reacting to issues, teams can generate a periodic report:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Resource&lt;/th&gt;&lt;th&gt;Drift Type&lt;/th&gt;&lt;th&gt;Risk&lt;/th&gt;&lt;th&gt;Root Cause&lt;/th&gt;&lt;th&gt;Action&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;OpenAI Endpoint&lt;/td&gt;&lt;td&gt;Network Exposure&lt;/td&gt;&lt;td&gt;🔴 High&lt;/td&gt;&lt;td&gt;Portal change&lt;/td&gt;&lt;td&gt;Revert + Policy&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Storage Account&lt;/td&gt;&lt;td&gt;Security Drift&lt;/td&gt;&lt;td&gt;🔴 High&lt;/td&gt;&lt;td&gt;Script update&lt;/td&gt;&lt;td&gt;Validate automation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Key Vault&lt;/td&gt;&lt;td&gt;RBAC Drift&lt;/td&gt;&lt;td&gt;🔴 Critical&lt;/td&gt;&lt;td&gt;Manual access&lt;/td&gt;&lt;td&gt;Audit roles&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;col style="width: 20.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2&gt;⚡ Real-World Drift Scenarios&lt;/H2&gt;
&lt;P&gt;From enterprise Azure AI implementations:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Private endpoints removed for debugging&lt;/LI&gt;
&lt;LI&gt;Public access enabled temporarily&lt;/LI&gt;
&lt;LI&gt;RBAC permissions added for testing&lt;/LI&gt;
&lt;LI&gt;NSG rules changed for connectivity&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;👉 These changes are common — and easy to miss.&lt;/P&gt;
&lt;H2&gt;✅ Best Practices&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Always combine:
&lt;UL&gt;
&lt;LI&gt;IaC state&lt;/LI&gt;
&lt;LI&gt;Resource Graph&lt;/LI&gt;
&lt;LI&gt;Activity Logs&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Avoid auto-remediation without validation&lt;/LI&gt;
&lt;LI&gt;Use:
&lt;UL&gt;
&lt;LI&gt;Azure Policy (prevent drift)&lt;/LI&gt;
&lt;LI&gt;RBAC + PIM (limit access)&lt;/LI&gt;
&lt;LI&gt;Resource locks (protect critical resources)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;Generate a &lt;STRONG&gt;weekly drift digest&lt;/STRONG&gt; instead of reactive troubleshooting&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;💡 Key Takeaway&lt;/H2&gt;
&lt;P&gt;Drift detection tells you &lt;STRONG&gt;what changed&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;✅ AI tells you &lt;STRONG&gt;why it changed and what to do&lt;/STRONG&gt;&lt;/P&gt;
&lt;H2&gt;🚀 Looking Ahead&lt;/H2&gt;
&lt;P&gt;This approach opens new possibilities:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;AI-generated incident reports&lt;/LI&gt;
&lt;LI&gt;Drift-aware Copilot assistants&lt;/LI&gt;
&lt;LI&gt;Preventive controls before deployment&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;🔥 Next in Series&lt;/H2&gt;
&lt;P&gt;👉 &lt;EM&gt;AI Change Risk Scoring for Infrastructure Deployments — Predicting failures before they happen&lt;/EM&gt;&lt;/P&gt;
&lt;H2&gt;✍️ Final Thoughts&lt;/H2&gt;
&lt;P&gt;In modern Azure environments, drift is inevitable.&lt;/P&gt;
&lt;P&gt;But with the right combination of:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Observability&lt;/LI&gt;
&lt;LI&gt;Context&lt;/LI&gt;
&lt;LI&gt;Intelligence&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;👉 Drift becomes not a problem, but a source of insight.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Apr 2026 07:08:53 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/from-drift-to-diagnosis-ai-powered-root-cause-analysis-for-azure/ba-p/4515436</guid>
      <dc:creator>Pooja_Pradhan</dc:creator>
      <dc:date>2026-04-30T07:08:53Z</dc:date>
    </item>
    <item>
      <title>From Drift to Self-Healing: Building a Multi-Repo Azure AI Infrastructure You Can Actually Trust</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/from-drift-to-self-healing-building-a-multi-repo-azure-ai/ba-p/4515315</link>
      <description>&lt;H2&gt;A question before we begin&lt;/H2&gt;
&lt;P&gt;Quick — what version of infrastructure is running in your production subscription right now?&lt;/P&gt;
&lt;P&gt;Not what your pipeline last deployed. Not what your tracking spreadsheet says. What's &lt;STRONG&gt;actually running&lt;/STRONG&gt; — right now, this second — and can you prove it?&lt;/P&gt;
&lt;P&gt;If you had to think about it, you're in good company. So did we. That's basically why we ended up building everything I'm about to describe.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;The Friday night that started all this&lt;/H2&gt;
&lt;P&gt;So here's what happened. Production had a broken NSG rule. One of our engineers woke up, logged into the Azure Portal, edited the rule, hit Save. Done. Went back to bed.&lt;/P&gt;
&lt;P&gt;Fair enough — production was healthy again.&lt;/P&gt;
&lt;P&gt;The problem? Nobody noticed until Monday that someone had changed live infrastructure directly in the portal. No pipeline involved. No Terraform. No tracking. Our deployment registry still showed the old config. For three days, our "source of truth" was just... wrong.&lt;/P&gt;
&lt;P&gt;And that's the thing people don't talk about enough with cloud infrastructure. It's not just Terraform drift. It's anyone making a change through the Portal, CLI, PowerShell, REST API — basically any path that isn't your pipeline. Our system catches all of it now, but I'll get to that.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Why this is harder than it sounds&lt;/H2&gt;
&lt;P&gt;Let me paint the picture. A single Azure environment has easily 20+ resources. VNets with subnets and NSGs. Key Vaults. Storage Accounts with private endpoints. Container App Environments. AI services — OpenAI, AI Foundry. DNS zones, route tables, firewalls, managed identities, RBAC bindings tying everything together.&lt;/P&gt;
&lt;P&gt;Each resource has dozens of properties. And each property can be changed through the Portal, CLI, PowerShell, ARM, Bicep, Terraform, or raw REST calls. Any of those changes happen without your pipeline having a clue.&lt;/P&gt;
&lt;P&gt;Now multiply that across lab, non-live, and live. Across multiple subscriptions. Across multiple markets. Yeah.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;What we mean by "market"&lt;/H2&gt;
&lt;P&gt;We use the word &lt;STRONG&gt;market&lt;/STRONG&gt; to mean a country or regional business unit. Each one runs in its own Azure region with its own subscription. UK deploys to UK South, Czech to Germany West Central, and so on. Same platform blueprint, different geography, different network space, different secrets, different deployment lifecycle.&lt;/P&gt;
&lt;P&gt;Think of it as completely isolated copies of the same thing.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;The actual problem we had&lt;/H2&gt;
&lt;P&gt;We run Azure AI infrastructure across multiple markets. Each market has three environments — &lt;STRONG&gt;lab&lt;/STRONG&gt;, &lt;STRONG&gt;non-live&lt;/STRONG&gt;, &lt;STRONG&gt;live&lt;/STRONG&gt;. Each environment has 20+ resources. So we're talking hundreds of resources across dozens of subscriptions, all deployed through Terraform, all expected to stay in sync.&lt;/P&gt;
&lt;P&gt;They don't. Pipelines fail halfway through. Someone fixes something manually in the portal because it's 2 AM and they just want it to work. A registry update fails silently. And slowly, what you think is deployed and what's actually deployed start to diverge. One quiet failure at a time.&lt;/P&gt;
&lt;P&gt;We needed something different. Not just a deployment pipeline — we needed a system that knows what it deployed, can check whether that's still true, and notices when someone changes something outside the pipeline.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Can it actually detect portal changes?&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Yes. But there's a nuance.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;When someone edits a resource through the Azure Portal — modifies an NSG rule, changes a Key Vault policy, scales a Container App — the actual Azure resource changes. But Terraform's state file doesn't update because Terraform wasn't involved. The state serial stays the same. The version tracker stays the same. The registry stays the same.&lt;/P&gt;
&lt;P&gt;So the daily reconciliation pipeline, which compares serials and versions, would &lt;STRONG&gt;not&lt;/STRONG&gt; catch this. All three tracking files still agree — they're just all wrong.&lt;/P&gt;
&lt;P&gt;The portal change gets caught the &lt;STRONG&gt;next time the pipeline runs &lt;CODE&gt;terraform plan&lt;/CODE&gt;&lt;/STRONG&gt;. Terraform talks to Azure, compares what's in state versus what's actually there, and shows the diff:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;~ resource "azurerm_network_security_rule" "example" {
    ~ access = "Allow" -&amp;gt; "Deny"   # changed outside of Terraform
  }&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Then &lt;CODE&gt;terraform apply&lt;/CODE&gt; reverts the change — brings Azure back in line with the code. That's the self-healing bit.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;So in short:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The &lt;STRONG&gt;reconciliation pipeline&lt;/STRONG&gt; catches &lt;STRONG&gt;pipeline-level drift&lt;/STRONG&gt; — someone ran terraform outside the pipeline, or a registry write failed.&lt;/LI&gt;
&lt;LI&gt;The next &lt;STRONG&gt;&lt;CODE&gt;terraform plan/apply&lt;/CODE&gt;&lt;/STRONG&gt; catches &lt;STRONG&gt;resource-level drift&lt;/STRONG&gt; — portal changes, CLI changes, anything outside Terraform.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Between the two, nothing stays hidden for long.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Three repos, one platform&lt;/H2&gt;
&lt;P&gt;We split things into three repositories. Each does one thing well.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;1. Platform Repo&lt;/H3&gt;
&lt;P&gt;Deploys the foundation — VNets, DNS, Key Vaults, Container App Environments, AI Foundry, managed identities, firewall rules. Everything a market needs before any application team can deploy anything. Triggered by ServiceNow tickets or code merges.&lt;/P&gt;
&lt;H3&gt;2. Modules Repo&lt;/H3&gt;
&lt;P&gt;37 reusable Terraform modules, all built on top of &lt;A href="https://azure.github.io/Azure-Verified-Modules/" target="_blank" rel="noopener"&gt;Azure Verified Modules&lt;/A&gt;. Both the Platform Repo and Use-Case Repo pull from this. The whole point is that if the same Key Vault definition lives in two places, it will diverge within three months. This makes that impossible.&lt;/P&gt;
&lt;P&gt;Everything is version-pinned. No silent updates:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;source = "git::https://...//avm_modules/keyvault?ref=avm-res-keyvault-vault/v0.10.2"&lt;/CODE&gt;&lt;/PRE&gt;
&lt;H3&gt;3. Use-Case Repo&lt;/H3&gt;
&lt;P&gt;This is where application teams deploy their stuff — storage, databases, function apps, AI Search, container apps — on top of what the Platform Repo already set up. The repo has pre-written Terraform for 20+ resources. Teams don't write Terraform. They uncomment what they need. Push to a feature branch, it deploys to lab. Merge to staging, it goes to non-live. Merge to main, it goes to live. Approval gates at each step.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;The five pipelines&lt;/H2&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;Release pipeline.&lt;/STRONG&gt; Fires on merge to main or from a ServiceNow trigger. Runs semantic-release, creates a version tag (like &lt;CODE&gt;v7.0.0&lt;/CODE&gt;), then deploys to lab → non-live → live with an approval gate before live.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Feature pipeline.&lt;/STRONG&gt; Lab-only sandbox. Test your changes without touching anything real. Creates soft tags like &lt;CODE&gt;lab-v7.0.0.1&lt;/CODE&gt; so you can track what was deployed without cluttering the main version history.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Rollback pipeline.&lt;/STRONG&gt; Pick any previous tag (say &lt;CODE&gt;v6.1.0&lt;/CODE&gt;) and it restores it. The version tracker marks this as a deliberate rollback so reconciliation doesn't flag it as drift.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Reconciliation pipeline.&lt;/STRONG&gt; Runs at 6 AM every day. Compares what the registry says is deployed against what's actually in the Terraform state. Catches pipeline-level drift — someone ran &lt;CODE&gt;terraform apply&lt;/CODE&gt; outside the pipeline, or a registry write failed. Portal changes get caught on the next plan/apply run instead.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Commitlint pipeline.&lt;/STRONG&gt; Enforces conventional commit messages. That's what makes semantic versioning work — &lt;CODE&gt;fix:&lt;/CODE&gt; bumps patch, &lt;CODE&gt;feat:&lt;/CODE&gt; bumps minor, &lt;CODE&gt;BREAKING CHANGE:&lt;/CODE&gt; bumps major.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;The three tracking files&lt;/H2&gt;
&lt;P&gt;This is where most platforms stop too early. We keep three independent records for every deployment.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;1. Terraform State File&lt;/H3&gt;
&lt;P&gt;Terraform's own record of what it manages. Stored in Azure Blob Storage. Has a serial number that increments every time Terraform runs. If someone runs &lt;CODE&gt;terraform apply&lt;/CODE&gt; outside the pipeline, this serial changes but nothing else does.&lt;/P&gt;
&lt;H3&gt;2. Version Tracker&lt;/H3&gt;
&lt;P&gt;A JSON file sitting next to the state file. Written by the pipeline after every successful deploy. Records the version tag, commit SHA, run ID, timestamp. If someone deploys outside the pipeline, this file doesn't update — that's how we spot the mismatch.&lt;/P&gt;
&lt;H3&gt;3. Subscription Registry&lt;/H3&gt;
&lt;P&gt;A central blob container where each market gets a &lt;CODE&gt;subscription.json&lt;/CODE&gt;. Records what the platform thinks is deployed in each environment. But if the blob write fails (network blip), it can be wrong. That's why we cross-check against the other two.&lt;/P&gt;
&lt;P&gt;When all three agree, we're good. When they don't, we know exactly what drifted, where, and why.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;How drift detection works in practice&lt;/H2&gt;
&lt;P&gt;Every morning at 6, the reconciliation pipeline checks every market, subscription, and environment.&lt;/P&gt;
&lt;P&gt;It uses three tiers:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Tier 1&lt;/STRONG&gt; (best case): Version tracker exists. Compare its serial against the actual state serial and the registry. If anything's off, flag it.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Tier 2&lt;/STRONG&gt; (no tracker): Older environments without a version tracker. Fall back to comparing the registry's recorded serial against the actual state serial.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Tier 3&lt;/STRONG&gt; (no serials at all): Compare timestamps. If the state was modified more than five minutes after the last registry update, flag it.&lt;/P&gt;
&lt;P&gt;When it finds drift, it does &lt;STRONG&gt;not&lt;/STRONG&gt; auto-fix. It generates a report:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;Customer     | Environment | Registry  | Tracker   | State Serial | Status
customer-cz  | non-live    | v6.1.0    | v7.0.0    | 45           | DRIFT
customer-cz  | live        | v7.0.0    | v7.0.0    | 41           | OK
customer-uk  | lab         | v7.0.0    | v7.0.0    | 22           | OK&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;A human reviews and approves before anything gets corrected. No silent overwrites.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Did it work?&lt;/H2&gt;
&lt;P&gt;Within the first week, the reconciliation pipeline caught something real. Someone had run &lt;CODE&gt;terraform apply&lt;/CODE&gt; manually — outside the pipeline. The state serial changed, the version tracker didn't. Flagged at 6 AM the next morning.&lt;/P&gt;
&lt;P&gt;Separately, the next scheduled deployment caught a portal change. An NSG rule had been modified directly in the Azure Portal. &lt;CODE&gt;terraform plan&lt;/CODE&gt; showed the unexpected diff and &lt;CODE&gt;terraform apply&lt;/CODE&gt; reverted it.&lt;/P&gt;
&lt;P&gt;Between the two mechanisms, nothing went undetected for more than a day.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;Security&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;No secrets stored anywhere.&lt;/STRONG&gt; Everything is OIDC. GitHub App creds, state backend config — all fetched at runtime from Key Vault. Nothing in repo secrets or pipeline variables.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Private endpoints on everything.&lt;/STRONG&gt; Storage, Key Vault, AI Search, SQL, PostgreSQL, Container Registry — all deployed with public access disabled.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Quality gates on every PR.&lt;/STRONG&gt; Terraform fmt, validate, Checkov, tfsec, commitlint. Doesn't pass, doesn't merge.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Rollback is a first-class thing.&lt;/STRONG&gt; Restore any previous version. The tracker records it as intentional so reconciliation doesn't freak out.&lt;/P&gt;
&lt;P&gt;Here's the pattern every pipeline job follows:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;# Every job:
- name: "Azure Login"
  uses: ./.github/actions/azure-login

- name: "Fetch Secrets from Key Vault"
  id: kv-secrets
  uses: ./.github/actions/keyvault-secrets
  with:
    keyvault_name: ${{ env.KEYVAULT_NAME }}

- name: "Generate GitHub App Token"
  uses: ./.github/actions/github-app-token
  with:
    app_id: ${{ steps.kv-secrets.outputs.app_id }}
    private_key: ${{ steps.kv-secrets.outputs.private_key }}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;
&lt;H2&gt;What we'd tell you&lt;/H2&gt;
&lt;P&gt;&lt;STRONG&gt;Separate platform from application on day one.&lt;/STRONG&gt; The repo boundary is the governance boundary. Shared repo means shared blast radius.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Pin everything.&lt;/STRONG&gt; Modules, providers, Terraform versions. If today's deploy can produce a different result tomorrow because something upstream changed, you don't have reproducible infrastructure.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Assume every component will fail on its own.&lt;/STRONG&gt; The pipeline will succeed but the registry write will fail. Build verification loops before you need them, not after.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Make the right path the easy path.&lt;/STRONG&gt; If uncommenting a Terraform block is easier than writing one from scratch, people will uncomment. If deploying through a form is easier than asking the platform team, they'll use the form. Design for that.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Understand the two types of drift.&lt;/STRONG&gt; Pipeline drift (someone ran terraform outside the pipeline) gets caught by metadata comparison. Resource drift (portal changes) gets caught by the next &lt;CODE&gt;terraform plan&lt;/CODE&gt;. You need both.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2&gt;The bottom line&lt;/H2&gt;
&lt;P&gt;We started with: &lt;EM&gt;can you prove what's deployed in production right now?&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Now we can. And the platform checks it every morning at 6 AM.&lt;/P&gt;
&lt;P&gt;Three repos. Five pipelines. 37 modules. SemVer on every deployment. Three tracking files that cross-check each other. One daily alarm clock asking: &lt;EM&gt;is everything still true?&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;It always is. And when it isn't, we know within hours.&lt;/P&gt;
&lt;HR /&gt;
&lt;P&gt;If you're managing multi-subscription Azure infrastructure and dealing with drift or visibility problems — we'd genuinely like to hear how you're handling it.&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2026 09:58:14 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/from-drift-to-self-healing-building-a-multi-repo-azure-ai/ba-p/4515315</guid>
      <dc:creator>Valini_Sunthwal</dc:creator>
      <dc:date>2026-05-04T09:58:14Z</dc:date>
    </item>
    <item>
      <title>Operating AI Agents on Azure: Observability with Azure AI Foundry</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/operating-ai-agents-on-azure-observability-with-azure-ai-foundry/ba-p/4515975</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Azure AI Foundry&lt;/STRONG&gt; is Microsoft’s enterprise platform for building, deploying, and operating AI applications and intelligent agents as first‑class Azure &lt;STRONG&gt;workloads&lt;/STRONG&gt;. From an &lt;STRONG&gt;infrastructure perspective&lt;/STRONG&gt;, Foundry acts as a control plane that brings together &lt;STRONG&gt;model hosting&lt;/STRONG&gt;, &lt;STRONG&gt;agent execution&lt;/STRONG&gt;, &lt;STRONG&gt;tooling&lt;/STRONG&gt;, evaluation, and observability under a single, governed environment. It integrates natively with Azure services such as &lt;STRONG&gt;Azure Monitor&lt;/STRONG&gt;, &lt;STRONG&gt;Application Insights&lt;/STRONG&gt;, &lt;STRONG&gt;managed identities&lt;/STRONG&gt;, and role‑based access control, allowing AI workloads to be managed with the same rigor as traditional applications. Azure AI Foundry supports both portal‑driven and code‑driven workflows, enabling platform teams to standardize lifecycle management while supporting diverse development styles. This makes Foundry particularly suited for production environments where governance, auditability, and operational consistency are non‑negotiable.&lt;/P&gt;
&lt;H3&gt;Microsoft Agent Framework (MAF)&lt;/H3&gt;
&lt;P&gt;Microsoft Agent Framework is an Azure‑native framework designed to build and operate intelligent agents directly within Azure AI Foundry. It treats agents as managed workloads, enabling platform teams to deploy, secure, and monitor them through the Foundry control plane. Tracing is enabled automatically without code changes, capturing agent reasoning, tool calls, and multi‑agent interactions in Application Insights.&lt;/P&gt;
&lt;P&gt;From an operational standpoint, MAF offers the strongest alignment with enterprise governance and no‑code observability. SRE teams gain immediate visibility without custom instrumentation, making it the preferred choice where standardization, compliance, and operational simplicity are critical.&lt;/P&gt;
&lt;H3&gt;Semantic Kernel (SK)&lt;/H3&gt;
&lt;P&gt;Semantic Kernel provides a structured way to orchestrate LLMs with plugins, planners, and functions in .NET and Python environments. When connected to Azure AI Foundry, it automatically emits telemetry for prompts, responses, function execution, and token usage using Azure inference connectors.&lt;/P&gt;
&lt;P&gt;Operationally, Semantic Kernel sits in the low‑code category. It offers strong observability with minimal configuration while allowing teams to retain code‑level control. This makes it suitable for teams that need transparency and structure without fully managed agent hosting.&lt;/P&gt;
&lt;H3&gt;LangChain&lt;/H3&gt;
&lt;P&gt;LangChain is a widely adopted open‑source framework for building agent workflows and retrieval‑augmented applications. Azure AI Foundry integrates with LangChain using OpenTelemetry‑based tracers, allowing chain execution, tool calls, and model interactions to be captured in Application Insights.&lt;/P&gt;
&lt;P&gt;From a platform perspective, LangChain tracing requires explicit configuration but delivers consistent observability once enabled. It fits organizations that standardize on OSS tooling while still needing enterprise‑grade monitoring and debugging through Azure Monitor.&lt;/P&gt;
&lt;H3&gt;LangGraph&lt;/H3&gt;
&lt;P&gt;LangGraph extends LangChain by enabling graph‑based, stateful agent orchestration with conditional routing. Azure AI Foundry traces LangGraph workflows using the same OpenTelemetry integration, capturing node transitions, tool usage, and execution paths.&lt;/P&gt;
&lt;P&gt;This model is operationally valuable for complex workflows but introduces higher configuration overhead. It suits advanced teams that need deep insight into non‑linear agent execution while maintaining centralized observability.&lt;/P&gt;
&lt;H3&gt;OpenAI Agent SDK&lt;/H3&gt;
&lt;P&gt;The OpenAI Agent SDK provides low‑level control for building custom agent runtimes. Unlike other frameworks, it does not emit telemetry by default. To integrate with Azure AI Foundry, teams must manually instrument OpenTelemetry spans and export them to Application Insights.&lt;/P&gt;
&lt;P&gt;This approach offers maximum flexibility but shifts observability responsibility to the engineering team. It is best suited for specialized scenarios where bespoke agent execution outweighs operational simplicity.&lt;/P&gt;
&lt;H3&gt;Log Tracing in Azure AI Foundry&lt;/H3&gt;
&lt;P&gt;Log tracing in Azure AI Foundry addresses a core operational gap in agent‑based systems that traditional logging cannot solve. AI agents are non‑linear by nature, often invoking multiple tools, branching based on intermediate reasoning, and coordinating with other agents. Azure AI Foundry uses OpenTelemetry and Azure Monitor Application Insights to capture detailed execution traces that show how requests flow through agents, models, and tools. These traces include timing, errors, token usage, and reasoning paths, enabling teams to debug failures and performance issues with precision. By exposing this telemetry through the Foundry portal and Azure Monitor, platform teams gain a unified view of agent behavior that supports troubleshooting, governance reviews, and production reliability at scale.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Depth of Agent Visibility&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table style="width: 100%;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Capability&lt;/th&gt;&lt;th&gt;Microsoft Agent Framework (MAF)&lt;/th&gt;&lt;th&gt;Semantic Kernel (SK)&lt;/th&gt;&lt;th&gt;LangChain&lt;/th&gt;&lt;th&gt;LangGraph&lt;/th&gt;&lt;th&gt;OpenAI Agent SDK&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Agent decision flow visibility&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;You can see the full step‑by‑step path of what the agent decided to do, including why it chose a particular action, which step came first, and how it reached the final answer. This is visible automatically in Azure AI Foundry without adding code.&lt;/td&gt;&lt;td&gt;You can see the main execution path of the agent, including which skill or planner was used, but fine‑grained reasoning steps may be abstracted unless you add extra logging.&lt;/td&gt;&lt;td&gt;You usually see the final chain execution and intermediate steps, but the “why” behind decisions is not always clear unless you manually add tracing.&lt;/td&gt;&lt;td&gt;You can see each node in the graph and which path was taken at runtime, making it easier to understand branching decisions.&lt;/td&gt;&lt;td&gt;You mostly see inputs and outputs. Internal reasoning and decision paths are not visible unless you explicitly create and manage custom spans.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Tool invocation tracing&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Every tool call is captured automatically, showing which tool was called, with what input, how long it took, and what result it returned, all correlated to the agent run.&lt;/td&gt;&lt;td&gt;Tool and function calls are tracked, but details depend on how plugins are implemented and may need configuration to capture inputs and outputs clearly.&lt;/td&gt;&lt;td&gt;Tool calls are visible when tracing is enabled, but they may appear as generic steps unless you customize the tracer.&lt;/td&gt;&lt;td&gt;Tool usage is clearly tied to graph nodes, making it easier to identify which step called which tool.&lt;/td&gt;&lt;td&gt;Tool calls are only visible if you explicitly instrument them using OpenTelemetry or custom logging.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Multi‑agent support visibility&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;When multiple agents work together, you can follow a single request as it moves from one agent to another, clearly seeing which agent handled which part of the task.&lt;/td&gt;&lt;td&gt;Multi‑agent flows are possible, but visibility across agents is limited and usually requires manual correlation.&lt;/td&gt;&lt;td&gt;Multi‑agent patterns are supported conceptually, but tracing across agents is not automatic and can be difficult to follow.&lt;/td&gt;&lt;td&gt;Designed for complex workflows, LangGraph allows visibility across multiple agents or components, but still requires configuration.&lt;/td&gt;&lt;td&gt;No built‑in way to trace requests across multiple agents unless you design and implement it yourself.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Automatic span hierarchy&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Traces are automatically structured in a parent‑child hierarchy, such as request → agent → decision → tool → model call, without manual setup.&lt;/td&gt;&lt;td&gt;Most spans are structured correctly by default, but complex scenarios may require tuning.&lt;/td&gt;&lt;td&gt;Span hierarchy is available when using the provided Azure tracer, but understanding relationships may require experience.&lt;/td&gt;&lt;td&gt;Hierarchy is preserved across graph nodes, making execution order clearer.&lt;/td&gt;&lt;td&gt;No automatic hierarchy; you must manually create and link spans to understand execution flow.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Tracing Integrations Comparison&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Aspect&lt;/th&gt;&lt;th&gt;Microsoft Agent Framework (MAF)&lt;/th&gt;&lt;th&gt;Semantic Kernel (SK)&lt;/th&gt;&lt;th&gt;LangChain&lt;/th&gt;&lt;th&gt;LangGraph&lt;/th&gt;&lt;th&gt;OpenAI Agent SDK&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Integration type with Azure AI Foundry&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Tracing is built directly into Azure AI Foundry when agents are created or managed in the Foundry portal. The platform automatically enables tracing at the server side.&lt;/td&gt;&lt;td&gt;Tracing works through Azure AI inference connectors that integrate Semantic Kernel with Foundry. Telemetry is emitted automatically once configured.&lt;/td&gt;&lt;td&gt;Tracing is enabled using an OpenTelemetry‑based tracer provided by the langchain-azure-ai package and connected to Foundry.&lt;/td&gt;&lt;td&gt;Uses the same OpenTelemetry tracer as LangChain, extended to support graph‑based execution paths.&lt;/td&gt;&lt;td&gt;No built‑in integration. You must manually add OpenTelemetry instrumentation and export traces to Application Insights.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Code changes required&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;No code changes are required if you use Foundry‑managed agents. Tracing works out of the box.&lt;/td&gt;&lt;td&gt;Minimal code or configuration is required to attach the Azure AI inference connector and enable telemetry.&lt;/td&gt;&lt;td&gt;Low‑code setup. You must add the tracer dependency and configure environment variables.&lt;/td&gt;&lt;td&gt;Low‑code setup similar to LangChain, plus configuration to trace graph nodes.&lt;/td&gt;&lt;td&gt;Pro‑code. You must explicitly write tracing logic, define spans, and manage exporters yourself.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;What tracing actually captures&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;You can see agent executions, internal decision steps, tool and MCP calls, model requests, token usage, latency, errors, and multi‑agent interactions in one correlated trace.&lt;/td&gt;&lt;td&gt;You can see prompts, responses, function or plugin calls, token usage, latency, and planner execution, depending on configuration.&lt;/td&gt;&lt;td&gt;You see chain execution steps, tool calls, and LLM interactions, but understanding deeper reasoning may require reading the chain logic.&lt;/td&gt;&lt;td&gt;You see each graph node, execution order, conditional branching, and related tool calls, which helps debug complex workflows.&lt;/td&gt;&lt;td&gt;By default you only see inputs and outputs. Detailed decision steps or tool usage appear only if you manually instrument them.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Multi‑agent tracing support&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Fully supported. You can follow a single request as it moves between multiple agents inside one trace.&lt;/td&gt;&lt;td&gt;Partially supported. Multi‑agent flows are possible but cross‑agent visibility may need manual correlation.&lt;/td&gt;&lt;td&gt;Limited. Multi‑agent patterns exist, but traces across agents are not clearly linked by default.&lt;/td&gt;&lt;td&gt;Better support than LangChain due to graph structure, but still requires deliberate configuration.&lt;/td&gt;&lt;td&gt;Not supported by default. You must design and implement cross‑agent tracing manually.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Who should use this approach&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Platform and infrastructure teams that want enterprise‑grade observability with zero instrumentation effort.&lt;/td&gt;&lt;td&gt;Teams building structured AI workflows in .NET or Python that want good visibility with minimal effort.&lt;/td&gt;&lt;td&gt;Open‑source Python teams that want Azure‑grade observability without leaving LangChain.&lt;/td&gt;&lt;td&gt;Teams building complex, stateful, or branching agent workflows that need visibility into execution paths.&lt;/td&gt;&lt;td&gt;Advanced teams that need full control and are comfortable managing observability pipelines themselves.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For deeper understanding and hands-on implementation, I highly recommend explore this documentation: &lt;A href="https://learn.microsoft.com/en-us/agent-framework/agents/observability?pivots=programming-language-python" target="_blank"&gt;Observability | Microsoft Learn&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Apr 2026 19:15:11 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/operating-ai-agents-on-azure-observability-with-azure-ai-foundry/ba-p/4515975</guid>
      <dc:creator>skundapura</dc:creator>
      <dc:date>2026-04-29T19:15:11Z</dc:date>
    </item>
    <item>
      <title>Building an Enterprise-Grade SQL Platform on Kubernetes using Crossplane and Azure PostgreSQL</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-an-enterprise-grade-sql-platform-on-kubernetes-using/ba-p/4515635</link>
      <description>&lt;H2&gt;Strategic Overview&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Build a Kubernetes-native SQL platform using Crossplane-based database operator to provision Azure PostgreSQL Flexible Server.&lt;/LI&gt;
&lt;LI&gt;Active–Passive, multi-region database architecture using read replicas and manual promotion for failover.&lt;/LI&gt;
&lt;LI&gt;Private networking, DNS abstraction, and virtual endpoints to ensure secure and stable connectivity.&lt;/LI&gt;
&lt;LI&gt;Azure Traffic Manager + DNS failover strategy to enable global routing and minimize manual intervention.&lt;/LI&gt;
&lt;LI&gt;Enterprise-grade HA/DR, with replication, backup, and failover testing workflows.&lt;/LI&gt;
&lt;LI&gt;Observability via Azure Monitor + Datadog for proactive detection (CPU, replication lag, etc.).&lt;/LI&gt;
&lt;LI&gt;Security-first architecture with private endpoints, Azure AD authentication, and no public access.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Problem Statement&lt;/H2&gt;
&lt;P&gt;Modern platform teams struggle to offer database-as-a-service (DBaaS) with the same level of automation, governance, and consistency that exists for stateless workloads in Kubernetes.&lt;/P&gt;
&lt;P&gt;Key gaps:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Database provisioning is still manual, ticket-driven, or portal-based&lt;/LI&gt;
&lt;LI&gt;Lack of standardized HA/DR patterns across teams&lt;/LI&gt;
&lt;LI&gt;Inconsistent networking, DNS, and security configurations&lt;/LI&gt;
&lt;LI&gt;Failover and DR processes require manual intervention and risk downtime&lt;/LI&gt;
&lt;LI&gt;No unified declarative interface for database lifecycle management&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Goals&lt;/H2&gt;
&lt;P&gt;Design and implement a Kubernetes-native, enterprise-grade SQL platform that:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Exposes databases as declarative Kubernetes resources&lt;/LI&gt;
&lt;LI&gt;Automates provisioning via Crossplane using Azure PostgreSQL Flexible Server&lt;/LI&gt;
&lt;LI&gt;Provides built-in HA/DR capabilities across regions&lt;/LI&gt;
&lt;LI&gt;Enables seamless failover through DNS abstraction&lt;/LI&gt;
&lt;LI&gt;Enforces secure, private, and compliant database access patterns&lt;/LI&gt;
&lt;LI&gt;Integrates observability, backup, and operational controls by default&lt;/LI&gt;
&lt;LI&gt;Delivers a self-service experience for developers without compromising governance&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Architecture Overview&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Crossplane acts as the control plane, translating Kubernetes intent into Azure-managed DB resources&lt;/LI&gt;
&lt;LI&gt;Azure PostgreSQL Flexible Server provides managed HA + replication primitives&lt;/LI&gt;
&lt;LI&gt;Private DNS + Private Endpoints ensure zero public exposure&lt;/LI&gt;
&lt;LI&gt;Traffic Manager enables global abstraction and failover routing&lt;/LI&gt;
&lt;LI&gt;Replica promotion + DNS switch = DR execution model&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Kubernetes-Native Provisioning (Crossplane)&lt;/H2&gt;
&lt;H3&gt;What I built&lt;/H3&gt;
&lt;P&gt;A custom database resource exposed to Kubernetes users:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;kind: XPostgreSQLDatabase&lt;/LI-CODE&gt;
&lt;P&gt;Defines:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Primary and secondary regions&lt;/LI&gt;
&lt;LI&gt;Storage and compute config&lt;/LI&gt;
&lt;LI&gt;Networking + DNS&lt;/LI&gt;
&lt;LI&gt;Security (credentials as secret references)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;From config:&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Primary region: eastus2&lt;/LI&gt;
&lt;LI&gt;Secondary region: central us&lt;/LI&gt;
&lt;LI&gt;Private DNS zone used: testmulti.postgres.database.azure.com&lt;/LI&gt;
&lt;LI&gt;DB size and storage configured declaratively&lt;/LI&gt;
&lt;LI&gt;Credentials managed via Kubernetes Secret&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Crossplane Foundation&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;Provider configuration uses Azure credentials via Kubernetes secret:&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="yaml"&gt;apiVersion: azure.m.upbound.io/v1beta1
kind: ClusterProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: azure-creds
      key: credentials
---
apiVersion: azure.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: azure-creds
      key: credentials&lt;/LI-CODE&gt;
&lt;UL&gt;
&lt;LI&gt;Functions extend composition logic:&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="yaml"&gt;apiVersion: pkg.crossplane.io/v1beta1
kind: Function
metadata:
  name: function-patch-and-transform
spec:
  package: xpkg.upbound.io/crossplane-contrib/function-patch-and-transform:v0.8.2&lt;/LI-CODE&gt;
&lt;H3&gt;Step-by-Step Implementation&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;1. Define Platform API&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Create XRD (Composite Resource Definition)&lt;/LI&gt;
&lt;LI&gt;Expose database as a Kubernetes primitive (XPostgreSQLDatabase)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;2. Build Composition&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Map Kubernetes resource → Azure PostgreSQL Flexible Server&lt;/LI&gt;
&lt;LI&gt;Create:
&lt;UL&gt;
&lt;LI&gt;Primary server&lt;/LI&gt;
&lt;LI&gt;Replica server (secondary region)&lt;/LI&gt;
&lt;LI&gt;Networking artifacts (Private Endpoint, DNS)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;3. Provision Database&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Developer applies custom resource&lt;/LI&gt;
&lt;LI&gt;Crossplane:
&lt;UL&gt;
&lt;LI&gt;Calls Azure APIs&lt;/LI&gt;
&lt;LI&gt;Creates full database topology&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Control-plane prerequisites&lt;/H3&gt;
&lt;H4&gt;1. Install the Crossplane functions&lt;/H4&gt;
&lt;P&gt;Your attached functions.yaml installs two functions:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;function-patch-and-transform&lt;/LI&gt;
&lt;LI&gt;crossplane-contrib-function-python&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="yaml"&gt;apiVersion: pkg.crossplane.io/v1beta1
kind: Function
metadata:
  name: function-patch-and-transform
spec:
  package: xpkg.upbound.io/crossplane-contrib/function-patch-and-transform:v0.8.2
---
apiVersion: pkg.crossplane.io/v1beta1
kind: Function
metadata:
  name: crossplane-contrib-function-python
spec:
  package: ghcr.io/crossplane-contrib/function-python:v0.2.0&lt;/LI-CODE&gt;
&lt;H4&gt;2. Configure the Azure provider credentials&lt;/H4&gt;
&lt;P&gt;Your provider-config.yaml defines both a ClusterProviderConfig and a namespaced ProviderConfig, each reading credentials from the azure-creds secret in the crossplane-system namespace.&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;apiVersion: azure.m.upbound.io/v1beta1
kind: ClusterProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: azure-creds
      key: credentials
---
apiVersion: azure.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: default
  namespace: crossplane-system
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: azure-creds
      key: credentials&lt;/LI-CODE&gt;
&lt;H2&gt;What Crossplane creates from that XR&lt;/H2&gt;
&lt;H4&gt;1. Resource Group&lt;/H4&gt;
&lt;P&gt;The composition first creates an Azure ResourceGroup, with its location patched from spec.regions.primary.name and its name patched from spec.resourceGroup.name. It also writes the resulting resource group name back into composite status.&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;- name: resource-group
  base:
    apiVersion: azure.upbound.io/v1beta1
    kind: ResourceGroup
    spec:
      forProvider:
        location: eastus2
  patches:
    - type: FromCompositeFieldPath
      fromFieldPath: spec.regions.primary.name
      toFieldPath: spec.forProvider.location
    - type: FromCompositeFieldPath
      fromFieldPath: spec.resourceGroup.name
      toFieldPath: metadata.name
    - type: ToCompositeFieldPath
      fromFieldPath: metadata.name
      toFieldPath: status.resourceGroupName&lt;/LI-CODE&gt;
&lt;H4&gt;2. Backup storage resources&lt;/H4&gt;
&lt;P&gt;The managed composition also creates:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;a storage account with accountReplicationType: GRS&lt;/LI&gt;
&lt;LI&gt;a backup container in that account&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="yaml"&gt;- name: backup-storage-account
  base:
    apiVersion: storage.azure.upbound.io/v1beta2
    kind: Account
    spec:
      forProvider:
        accountTier: Standard
        accountReplicationType: GRS
        sharedAccessKeyEnabled: true
        tags:
          purpose: postgresql-backups
          automation: enabled

- name: backup-container
  base:
    apiVersion: storage.azure.upbound.io/v1beta1
    kind: Container&lt;/LI-CODE&gt;
&lt;H4&gt;3. Private DNS zone&lt;/H4&gt;
&lt;P&gt;A PrivateDNSZone is created, and its external name is patched from spec.network.privateDnsZoneName. The zone name is also written back into composite status.&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;- name: private-dns-zone
  base:
    apiVersion: network.azure.upbound.io/v1beta1
    kind: PrivateDNSZone
    metadata:
      annotations:
        crossplane.io/external-name: postgres.database.azure.com
  patches:
    - type: FromCompositeFieldPath
      fromFieldPath: spec.network.privateDnsZoneName
      toFieldPath: metadata.annotations[crossplane.io/external-name]
    - type: ToCompositeFieldPath
      fromFieldPath: metadata.annotations[crossplane.io/external-name]
      toFieldPath: status.dnsZoneName&lt;/LI-CODE&gt;
&lt;H4&gt;4. Primary region network&lt;/H4&gt;
&lt;P&gt;The composition creates a &lt;STRONG&gt;primary virtual network&lt;/STRONG&gt;, &lt;STRONG&gt;primary subnet&lt;/STRONG&gt;, and a &lt;STRONG&gt;Private DNS zone link&lt;/STRONG&gt;. The VNet CIDR comes from spec.regions.primary.cidr. The subnet CIDR is derived from that primary CIDR using a regexp + format transform. The subnet is delegated to Microsoft.DBforPostgreSQL/flexibleServers.&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;- name: primary-vnet
  base:
    apiVersion: network.azure.upbound.io/v1beta1
    kind: VirtualNetwork
    metadata:
      labels:
        role: primary
    spec:
      forProvider:
        addressSpace:
          - 10.0.0.0/16
  patches:
    - type: FromCompositeFieldPath
      fromFieldPath: spec.regions.primary.cidr
      toFieldPath: spec.forProvider.addressSpace[0]

- name: primary-subnet
  base:
    apiVersion: network.azure.upbound.io/v1beta1
    kind: Subnet
    metadata:
      labels:
        role: primary
    spec:
      forProvider:
        delegation:
          - name: fs
            serviceDelegation:
              - name: Microsoft.DBforPostgreSQL/flexibleServers&lt;/LI-CODE&gt;
&lt;H4&gt;5. Primary database server&lt;/H4&gt;
&lt;P&gt;The primary Azure PostgreSQL Flexible Server is created with:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;private access only (publicNetworkAccessEnabled: false)&lt;/LI&gt;
&lt;LI&gt;subnet delegated from the primary subnet&lt;/LI&gt;
&lt;LI&gt;private DNS zone association&lt;/LI&gt;
&lt;LI&gt;admin credentials patched from the composite spec&lt;/LI&gt;
&lt;LI&gt;SKU, storage, version, retention, and backup settings patched from the composite spec&lt;/LI&gt;
&lt;LI&gt;FQDN and server ID written back into composite status&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="yaml"&gt;- name: primary-server
  base:
    apiVersion: dbforpostgresql.azure.upbound.io/v1beta1
    kind: FlexibleServer
    metadata:
      labels:
        role: primary
        autoscaling: enabled
    annotations:
      management.platform.io/autoscale-enabled: 'true'
      management.platform.io/backup-enabled: 'true'
    spec:
      forProvider:
        publicNetworkAccessEnabled: false
        administratorLogin: psqladmin
        administratorPasswordSecretRef:
          name: ''
          namespace: crossplane-system
          key: password
  patches:
    - type: FromCompositeFieldPath
      fromFieldPath: spec.database.size
      toFieldPath: spec.forProvider.skuName
    - type: FromCompositeFieldPath
      fromFieldPath: spec.database.storageGB
      toFieldPath: spec.forProvider.storageMb
    - type: FromCompositeFieldPath
      fromFieldPath: spec.database.version
      toFieldPath: spec.forProvider.version
    - type: FromCompositeFieldPath
      fromFieldPath: spec.database.backupRetentionDays
      toFieldPath: spec.forProvider.backupRetentionDays
    - type: FromCompositeFieldPath
      fromFieldPath: spec.database.geoRedundantBackup
      toFieldPath: spec.forProvider.geoRedundantBackupEnabled
    - type: FromCompositeFieldPath
      fromFieldPath: spec.security.adminUsername
      toFieldPath: spec.forProvider.administratorLogin&lt;/LI-CODE&gt;
&lt;H4&gt;6. Secondary region network and replica&lt;/H4&gt;
&lt;P&gt;The composition then creates:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;secondary VNet&lt;/LI&gt;
&lt;LI&gt;secondary subnet&lt;/LI&gt;
&lt;LI&gt;DNS link for the secondary VNet&lt;/LI&gt;
&lt;LI&gt;bidirectional VNet peering&lt;/LI&gt;
&lt;LI&gt;a secondary PostgreSQL Flexible Server with createMode: Replica&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="yaml"&gt;- name: secondary-server
  base:
    apiVersion: dbforpostgresql.azure.upbound.io/v1beta1
    kind: FlexibleServer
    metadata:
      labels:
        role: secondary
        replica: 'true'
    annotations:
      management.platform.io/failover-candidate: 'true'
      management.platform.io/promotion-priority: '1'
    spec:
      forProvider:
        location: centralus
        createMode: Replica
        sourceServerId: ''
        publicNetworkAccessEnabled: false
  patches:
    - type: CombineFromComposite
      combine:
        variables:
          - fromFieldPath: metadata.name
        strategy: string
        string:
          fmt: /subscriptions/96618111-38e8-48c0-b564-ee5acde49c15/resourceGroups/postgres-crossplane-rg/providers/Microsoft.DBforPostgreSQL/flexibleServers/%s-primary
      toFieldPath: spec.forProvider.sourceServerId&lt;/LI-CODE&gt;
&lt;H4&gt;7. Read/write DNS records&lt;/H4&gt;
&lt;P&gt;The composition creates multiple PrivateDNSCNAMERecord resources for read and write endpoint abstraction. These records are patched from spec.network.privateDnsZoneName, spec.network.writeEndpointName, and spec.network.readEndpointName, and some are annotated with management.platform.io/update-on-failover: 'true'.&lt;/P&gt;
&lt;LI-CODE lang="yaml"&gt;- name: cname-write
  base:
    apiVersion: network.azure.upbound.io/v1beta1
    kind: PrivateDNSCNAMERecord
    metadata:
      annotations:
        management.platform.io/managed-by: failover-script
        management.platform.io/update-on-failover: 'true'
    spec:
      forProvider:
        ttl: 300

- name: cname-read
  base:
    apiVersion: network.azure.upbound.io/v1beta1
    kind: PrivateDNSCNAMERecord
    metadata:
      annotations:
        management.platform.io/managed-by: failover-script
        management.platform.io/update-on-failover: 'true'
    spec:
      forProvider:
        ttl: 300&lt;/LI-CODE&gt;
&lt;H4&gt;8. Management objects inside Kubernetes&lt;/H4&gt;
&lt;P&gt;The managed composition also creates Kubernetes-native control objects through the Kubernetes provider:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;a ConfigMap for management settings&lt;/LI&gt;
&lt;LI&gt;a ServiceAccount&lt;/LI&gt;
&lt;LI&gt;a ClusterRole&lt;/LI&gt;
&lt;LI&gt;a ClusterRoleBinding&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="yaml"&gt;- name: management-config
  base:
    apiVersion: kubernetes.crossplane.io/v1alpha1
    kind: Object
    spec:
      forProvider:
        manifest:
          apiVersion: v1
          kind: ConfigMap
          data:
            backup-enabled: "true"
            backup-retention-days: "35"
            autoscaling-enabled: "true"
            failover-enabled: "true"

- name: management-clusterrole
  base:
    apiVersion: kubernetes.crossplane.io/v1alpha1
    kind: Object
    spec:
      forProvider:
        manifest:
          apiVersion: rbac.authorization.k8s.io/v1
          kind: ClusterRole&lt;/LI-CODE&gt;
&lt;H2&gt;High Availability (HA) and Disaster Recovery (DR)&lt;/H2&gt;
&lt;H3&gt;High Availability (HA) Strategy&lt;/H3&gt;
&lt;H4&gt;Design Principles&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Ensure minimal disruption for infrastructure-level failures&lt;/LI&gt;
&lt;LI&gt;Leverage&lt;STRONG&gt; &lt;/STRONG&gt;managed Azure HA capabilities&lt;/LI&gt;
&lt;LI&gt;Maintain consistent connectivity through private networking&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Implementation&lt;/H3&gt;
&lt;H4&gt;1. Zone-Redundant High Availability (Primary Region)&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;PostgreSQL Flexible Server supports zone-redundant HA deployment&lt;/LI&gt;
&lt;LI&gt;Primary database instance is replicated synchronously across Availability Zones&lt;/LI&gt;
&lt;LI&gt;Platform configuration enables:
&lt;UL&gt;
&lt;LI&gt;Same-region redundancy&lt;/LI&gt;
&lt;LI&gt;Automatic failover within region (infrastructure-level issues)&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Failover within region is handled by Azure, but cross-region failover is not automatic&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H4&gt;2. Resource Connectivity (HA Path)&lt;/H4&gt;
&lt;P&gt;Within a region, connectivity follows:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Application → Private DNS → Private Endpoint → PostgreSQL Primary&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;UL&gt;
&lt;LI&gt;Private endpoints connect database to VNet&lt;/LI&gt;
&lt;LI&gt;Private DNS ensures internal resolution&lt;/LI&gt;
&lt;LI&gt;Traffic never leaves Azure backbone&lt;/LI&gt;
&lt;LI&gt;Public access is disabled&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Disaster Recovery (DR) Strategy&lt;/H3&gt;
&lt;H4&gt;Design Principles&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Handle regional outages and large-scale failures&lt;/LI&gt;
&lt;LI&gt;Ensure data durability and failover capability&lt;/LI&gt;
&lt;LI&gt;Minimize RPO and RTO impact&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;1. Cross-Region Replication Architecture&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Secondary PostgreSQL server deployed in paired region&lt;/LI&gt;
&lt;LI&gt;Configured as read replica (asynchronous replication)&lt;/LI&gt;
&lt;LI&gt;Example:
&lt;UL&gt;
&lt;LI&gt;Primary: East US 2&lt;/LI&gt;
&lt;LI&gt;Secondary: Central US&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Primary (Write) ─────────► Replica (Read) Async Replication&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;UL&gt;
&lt;LI&gt;Replica is continuously receiving updates but not writable&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;2. Resource Connectivity (DR Path)&lt;/H4&gt;
&lt;P&gt;Cross-region setup includes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Global VNet + Hub-Spoke connectivity&lt;/LI&gt;
&lt;LI&gt;Private endpoints in both regions&lt;/LI&gt;
&lt;LI&gt;Shared Private DNS zone&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;App → DNS → Traffic Manager → Region Endpoint → Private Endpoint → DB&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;UL&gt;
&lt;LI&gt;Cross-region communication uses Azure backbone&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;3. Failover Process (DR Execution)&lt;/H4&gt;
&lt;P&gt;Azure PostgreSQL does not provide automatic global failover, hence DR is controlled and explicit&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;Step-by-step failover:&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;Detect primary region failure&lt;/LI&gt;
&lt;LI&gt;Promote replica to standalone primary&lt;/LI&gt;
&lt;LI&gt;Update DNS / Traffic Manager routing&lt;/LI&gt;
&lt;LI&gt;Redirect application traffic&lt;/LI&gt;
&lt;LI&gt;Validate connectivity and resume operations&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Replica → Promote → Becomes Primary → Traffic redirected&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;Conclusion&lt;/H2&gt;
&lt;P&gt;We modeled the database platform as a Kubernetes-native composite API, XPostgreSQLDatabase, and delegated infrastructure realization to a Crossplane pipeline composition. The composition reads user intent from the composite spec—regions, CIDR ranges, DNS settings, database sizing, retention, and admin secret references—and translates that into Azure resources including a resource group, private DNS zone, regional VNets and subnets, bidirectional peering, a primary Azure PostgreSQL Flexible Server, a cross-region replica, and private DNS CNAME records for read/write abstraction. In the managed variant, the composition also creates Kubernetes-side management artifacts for backup, autoscaling, and failover-related configuration.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Apr 2026 16:58:37 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/building-an-enterprise-grade-sql-platform-on-kubernetes-using/ba-p/4515635</guid>
      <dc:creator>prabhattomar</dc:creator>
      <dc:date>2026-04-29T16:58:37Z</dc:date>
    </item>
    <item>
      <title>Migrating Splunk Logs to Azure Application Insights on VMs</title>
      <link>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/migrating-splunk-logs-to-azure-application-insights-on-vms/ba-p/4515866</link>
      <description>&lt;H2 data-start="884" data-end="938"&gt;&lt;STRONG data-start="887" data-end="938"&gt;Phase 0 – Understanding the Current Environment&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P data-start="940" data-end="1010"&gt;Before touching any code or agents, map out the existing architecture.&lt;/P&gt;
&lt;P data-start="1012" data-end="1030"&gt;&lt;STRONG data-start="1012" data-end="1030"&gt;Current setup:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL data-start="1032" data-end="1321"&gt;
&lt;LI data-start="1032" data-end="1104"&gt;Applications write logs via&amp;nbsp;&lt;STRONG data-start="1062" data-end="1073"&gt;log4net&lt;/STRONG&gt;&amp;nbsp;→&amp;nbsp;&lt;STRONG data-start="1076" data-end="1095"&gt;local log files&lt;/STRONG&gt;&amp;nbsp;on VMs&lt;/LI&gt;
&lt;LI data-start="1105" data-end="1176"&gt;Splunk Universal Forwarders read logs → send to&amp;nbsp;&lt;STRONG data-start="1155" data-end="1174"&gt;Splunk Indexers&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="1177" data-end="1250"&gt;Splunk dashboards, alerts, and integrations with PagerDuty/ServiceNow&lt;/LI&gt;
&lt;LI data-start="1251" data-end="1321"&gt;Logic Monitor monitors VMs for CPU, memory, disk, uptime (not logs)&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P data-start="1323" data-end="1343"&gt;&lt;STRONG data-start="1323" data-end="1343"&gt;Developer tasks:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL data-start="1345" data-end="1687"&gt;
&lt;LI data-start="1345" data-end="1433"&gt;Document all log file locations:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang=""&gt;grep -r "log4net" /path/to/apps&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;2. List all Splunk forwarder configurations:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;cat /opt/splunkforwarder/etc/system/local/inputs.conf&lt;/LI-CODE&gt;
&lt;OL data-start="1345" data-end="1687"&gt;
&lt;LI data-start="1554" data-end="1611"&gt;Record all dashboards, alerts, and their dependencies.&lt;/LI&gt;
&lt;LI data-start="1612" data-end="1687"&gt;Note any compliance requirements (PHI-sensitive logs, retention policy).&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-start="1691" data-end="1818"&gt;&lt;STRONG data-start="1691" data-end="1710"&gt;Note:&lt;/STRONG&gt;&amp;nbsp;Capture Splunk dashboard screenshot with example alerts and log search query to reference during migration.&lt;/P&gt;
&lt;H2 data-start="1825" data-end="1867"&gt;&lt;STRONG data-start="1828" data-end="1867"&gt;Phase 1 – Prepare Azure Environment&lt;/STRONG&gt;&lt;/H2&gt;
&lt;H3 data-start="1869" data-end="1939"&gt;Step 1.1 – Create Application Insights and Log Analytics Workspace&lt;/H3&gt;
&lt;P data-start="1941" data-end="1963"&gt;&lt;STRONG data-start="1941" data-end="1963"&gt;Developer Actions:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL data-start="1965" data-end="2402"&gt;
&lt;LI data-start="1965" data-end="2033"&gt;Go to Azure Portal →&amp;nbsp;&lt;STRONG data-start="1989" data-end="2031"&gt;Create Resource → Application Insights&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="2034" data-end="2200"&gt;Fill in:
&lt;UL data-start="2049" data-end="2200"&gt;
&lt;LI data-start="2049" data-end="2077"&gt;Name: AppName-Insights&lt;/LI&gt;
&lt;LI data-start="2081" data-end="2119"&gt;Resource Group: Observability-RG&lt;/LI&gt;
&lt;LI data-start="2123" data-end="2154"&gt;Region: closest to your VMs&lt;/LI&gt;
&lt;LI data-start="2158" data-end="2200"&gt;Application Type: .NET, Java, or Other&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-start="2201" data-end="2240"&gt;Click&amp;nbsp;&lt;STRONG data-start="2210" data-end="2238"&gt;Review + Create → Create&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="2242" data-end="2286"&gt;Navigate to&amp;nbsp;&lt;STRONG data-start="2257" data-end="2284"&gt;Log Analytics Workspace&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="2287" data-end="2402"&gt;Create workspace for centralizing logs and metrics
&lt;UL data-start="2346" data-end="2402"&gt;
&lt;LI data-start="2346" data-end="2402"&gt;Note&amp;nbsp;&lt;STRONG data-start="2353" data-end="2369"&gt;Workspace ID&lt;/STRONG&gt;&amp;nbsp;and&amp;nbsp;&lt;STRONG data-start="2374" data-end="2389"&gt;Primary Key&lt;/STRONG&gt; for agents&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-start="2517" data-end="2580"&gt;Step 1.2 – Retrieve Instrumentation Key / Connection String&lt;/H3&gt;
&lt;OL data-start="2582" data-end="2697"&gt;
&lt;LI data-start="2582" data-end="2638"&gt;Open Application Insights resource →&amp;nbsp;&lt;STRONG data-start="2622" data-end="2636"&gt;Properties&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="2639" data-end="2697"&gt;Copy&amp;nbsp;&lt;STRONG data-start="2647" data-end="2670"&gt;Instrumentation Key&lt;/STRONG&gt;&amp;nbsp;or&amp;nbsp;&lt;STRONG data-start="2674" data-end="2695"&gt;Connection String&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-start="2701" data-end="2805"&gt;&lt;STRONG data-start="2701" data-end="2720"&gt;Note:&lt;/STRONG&gt; You will use this for SDK integration later and optional log appender configuration.&lt;/P&gt;
&lt;H2 data-start="2812" data-end="2871"&gt;&lt;STRONG data-start="2815" data-end="2871"&gt;Phase 2 – File-Based Ingestion Migration&amp;nbsp;&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P data-start="2873" data-end="2986"&gt;Since your apps already write logs to files, we can&amp;nbsp;&lt;STRONG data-start="2925" data-end="2985"&gt;replace Splunk forwarders with Azure Monitor Agent (AMA)&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3 data-start="2988" data-end="3043"&gt;Step 2.1 – Install Azure Monitor Agent (AMA) on VMs&lt;/H3&gt;
&lt;P data-start="3045" data-end="3060"&gt;&lt;STRONG data-start="3045" data-end="3060"&gt;Windows VM:&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# PowerShell script to install AMA Install-Agent.ps1 -WorkspaceId "&amp;lt;WorkspaceID&amp;gt;" -WorkspaceKey "&amp;lt;WorkspaceKey&amp;gt;"&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Linux VM:&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;sudo ./install-ama.sh --workspace-id &amp;lt;WorkspaceID&amp;gt; --workspace-key &amp;lt;WorkspaceKey&amp;gt;&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG&gt;Verify installation:&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# Windows Get-Service -Name "AzureMonitorAgent" # Linux sudo systemctl status azuremonitoragent&lt;/LI-CODE&gt;
&lt;P&gt;&lt;STRONG data-start="3453" data-end="3472"&gt;Note:&lt;/STRONG&gt;&amp;nbsp;Capture AMA service status to confirm it is running.&lt;/P&gt;
&lt;H3 data-start="3532" data-end="3578"&gt;Step 2.2 – Configure AMA to Read Log Files&lt;/H3&gt;
&lt;OL data-start="3580" data-end="3860"&gt;
&lt;LI data-start="3580" data-end="3662"&gt;Navigate to&amp;nbsp;&lt;STRONG data-start="3595" data-end="3660"&gt;Azure Portal → Log Analytics Workspace → Agents Configuration&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="3663" data-end="3826"&gt;Add&amp;nbsp;&lt;STRONG data-start="3670" data-end="3694"&gt;Data Collection Rule&lt;/STRONG&gt;:
&lt;UL data-start="3699" data-end="3826"&gt;
&lt;LI data-start="3699" data-end="3725"&gt;Select&amp;nbsp;&lt;STRONG data-start="3708" data-end="3723"&gt;Custom Logs&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="3729" data-end="3773"&gt;Add path for log4net log files on the VM&lt;/LI&gt;
&lt;LI data-start="3777" data-end="3826"&gt;Map fields like timestamp, log level, message&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-start="3828" data-end="3860"&gt;Assign the rule to your VM(s)&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-start="4014" data-end="4051"&gt;Step 2.3 – Validate Log Ingestion&lt;/H3&gt;
&lt;OL data-start="4053" data-end="4124"&gt;
&lt;LI data-start="4053" data-end="4101"&gt;Navigate to Application Insights →&amp;nbsp;&lt;STRONG data-start="4091" data-end="4099"&gt;Logs&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="4102" data-end="4124"&gt;Run a simple query:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang=""&gt;traces | order by timestamp desc | limit 50&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;3. Compare with Splunk logs:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Splunk Count&lt;/th&gt;&lt;th&gt;Azure Count&lt;/th&gt;&lt;th&gt;Status&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Errors last 1h&lt;/td&gt;&lt;td&gt;105&lt;/td&gt;&lt;td&gt;103&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;img&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/img&gt;
&lt;H2 data-start="4433" data-end="4465"&gt;&lt;STRONG data-start="4436" data-end="4465"&gt;Phase 3 – Alert Migration&lt;/STRONG&gt;&lt;/H2&gt;
&lt;H3 data-start="4467" data-end="4523"&gt;Step 3.1 – Map Splunk Alerts to Azure Monitor Alerts&lt;/H3&gt;
&lt;P data-start="4525" data-end="4545"&gt;&lt;STRONG data-start="4525" data-end="4545"&gt;Example Mapping:&lt;/STRONG&gt;&lt;/P&gt;
&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Splunk Alert&lt;/th&gt;&lt;th&gt;Azure Equivalent&lt;/th&gt;&lt;th&gt;Frequency&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Error threshold &amp;gt; 10&lt;/td&gt;&lt;td&gt;Log query alert (`traces&lt;/td&gt;&lt;td&gt;where severityLevel == "Error"&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-start="4756" data-end="4776"&gt;&lt;STRONG data-start="4756" data-end="4776"&gt;Developer Steps:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL data-start="4778" data-end="5033"&gt;
&lt;LI data-start="4778" data-end="4840"&gt;Navigate to&amp;nbsp;&lt;STRONG data-start="4793" data-end="4838"&gt;Azure Monitor → Alerts → + New alert rule&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI data-start="4841" data-end="4924"&gt;Select&amp;nbsp;&lt;STRONG data-start="4851" data-end="4863"&gt;Resource&lt;/STRONG&gt;&amp;nbsp;→ Application Insights →&amp;nbsp;&lt;STRONG data-start="4889" data-end="4902"&gt;Condition&lt;/STRONG&gt;&amp;nbsp;→ Custom log search&lt;/LI&gt;
&lt;LI data-start="4925" data-end="4991"&gt;Configure&amp;nbsp;&lt;STRONG data-start="4938" data-end="4954"&gt;Action Group&lt;/STRONG&gt;:
&lt;UL data-start="4959" data-end="4991"&gt;
&lt;LI data-start="4959" data-end="4991"&gt;PagerDuty, Email, ServiceNow&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI data-start="4992" data-end="5033"&gt;Set&amp;nbsp;&lt;STRONG data-start="4999" data-end="5031"&gt;Alert Frequency and Severity&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG data-start="4999" data-end="5031"&gt;&lt;STRONG data-start="5037" data-end="5056"&gt;Screenshot Tip:&lt;/STRONG&gt;&amp;nbsp;Capture alert creation screen with query and action group.&lt;/STRONG&gt;&lt;/P&gt;
&lt;H2 data-start="5122" data-end="5188"&gt;&lt;STRONG data-start="5125" data-end="5188"&gt;Phase 4 – Optional SDK-Based Migration (Full Observability)&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P data-start="5190" data-end="5230"&gt;For deep insights, tracing, and metrics.&lt;/P&gt;
&lt;H3 data-start="5232" data-end="5264"&gt;Step 4.1 – .NET Applications&lt;/H3&gt;
&lt;OL data-start="5266" data-end="5281"&gt;
&lt;LI data-start="5266" data-end="5281"&gt;Install SDK:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang=""&gt;dotnet add package Microsoft.ApplicationInsights.AspNetCore dotnet add package Microsoft.ApplicationInsights.Log4NetAppender&lt;/LI-CODE&gt;
&lt;P&gt;2. Configure in Program.cs:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;builder.Services.AddApplicationInsightsTelemetry();&lt;/LI-CODE&gt;
&lt;P&gt;3. Integrate log4net with Application Insights:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;&amp;lt;appender name="aiAppender" type="Microsoft.ApplicationInsights.Log4NetAppender.ApplicationInsightsAppender"/&amp;gt; &amp;lt;root&amp;gt; &amp;lt;level value="INFO"/&amp;gt; &amp;lt;appender-ref ref="aiAppender"/&amp;gt; &amp;lt;/root&amp;gt;&lt;/LI-CODE&gt;
&lt;P&gt;4. Add&amp;nbsp;&lt;STRONG data-start="5773" data-end="5796"&gt;Instrumentation Key&lt;/STRONG&gt; in appSettings.json:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;{ "ApplicationInsights": { "InstrumentationKey": "YOUR_KEY_HERE" } }&lt;/LI-CODE&gt;
&lt;H3 data-start="5916" data-end="5948"&gt;Step 4.2 – Java Applications&lt;/H3&gt;
&lt;OL data-start="5950" data-end="5974"&gt;
&lt;LI data-start="5950" data-end="5974"&gt;Add Maven dependency:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang=""&gt;&amp;lt;dependency&amp;gt; &amp;lt;groupId&amp;gt;com.microsoft.azure&amp;lt;/groupId&amp;gt; &amp;lt;artifactId&amp;gt;applicationinsights-web&amp;lt;/artifactId&amp;gt; &amp;lt;version&amp;gt;3.4.15&amp;lt;/version&amp;gt; &amp;lt;/dependency&amp;gt;&lt;/LI-CODE&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;Configure ApplicationInsights.xml with Instrumentation Key&lt;/LI&gt;
&lt;LI&gt;Enable automatic telemetry: requests, exceptions, dependencies&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Step 4.3 – Python Applications&lt;/P&gt;
&lt;LI-CODE lang=""&gt;from opencensus.ext.azure.log_exporter import AzureLogHandler import logging logger = logging.getLogger(__name__) logger.addHandler(AzureLogHandler(connection_string='InstrumentationKey=YOUR_KEY')) logger.setLevel(logging.INFO)&lt;/LI-CODE&gt;
&lt;H2 data-start="6561" data-end="6601"&gt;&lt;STRONG data-start="6564" data-end="6601"&gt;Phase 5 – Dual Logging Validation&lt;/STRONG&gt;&lt;/H2&gt;
&lt;UL data-start="6603" data-end="6676"&gt;
&lt;LI data-start="6603" data-end="6649"&gt;Keep Splunk forwarders running temporarily&lt;/LI&gt;
&lt;LI data-start="6650" data-end="6676"&gt;Compare logs and alerts:&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;colgroup&gt;&lt;col style="width: 25.0193%" /&gt;&lt;col style="width: 25.0193%" /&gt;&lt;col style="width: 25.0193%" /&gt;&lt;col style="width: 25.0193%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Log Type&lt;/th&gt;&lt;th&gt;Splunk&lt;/th&gt;&lt;th&gt;Azure&lt;/th&gt;&lt;th&gt;Status&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;App Errors&lt;/td&gt;&lt;td&gt;Y&lt;/td&gt;&lt;td&gt;Y&lt;/td&gt;&lt;td&gt;Match&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Info Logs&lt;/td&gt;&lt;td&gt;Y&lt;/td&gt;&lt;td&gt;Y&lt;/td&gt;&lt;td&gt;Match&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;P&gt;&lt;STRONG data-start="6817" data-end="6835"&gt;Developer Tip:&lt;/STRONG&gt;&amp;nbsp;Fix any missing log parsing or field extraction issues.&lt;/P&gt;
&lt;H2 data-start="6898" data-end="6946"&gt;&lt;STRONG data-start="6901" data-end="6946"&gt;Phase 6 – Cutover and Splunk Decommission&lt;/STRONG&gt;&lt;/H2&gt;
&lt;OL data-start="6948" data-end="7095"&gt;
&lt;LI data-start="6948" data-end="6974"&gt;Disable Splunk alerts&lt;/LI&gt;
&lt;LI data-start="6975" data-end="7016"&gt;Gradually stop forwarders on each VM&lt;/LI&gt;
&lt;LI data-start="7017" data-end="7062"&gt;Archive historical Splunk logs if needed&lt;/LI&gt;
&lt;LI data-start="7063" data-end="7095"&gt;Remove Splunk agent from VM&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-start="7099" data-end="7185"&gt;&lt;STRONG data-start="7099" data-end="7118"&gt;Screenshot Tip:&lt;/STRONG&gt;&amp;nbsp;Capture AMA and App Insights dashboards fully populated with logs.&lt;/P&gt;
&lt;H2 data-start="7192" data-end="7236"&gt;&lt;STRONG data-start="7195" data-end="7236"&gt;Phase 7 – Post-Migration Optimization&lt;/STRONG&gt;&lt;/H2&gt;
&lt;OL data-start="7238" data-end="7313"&gt;
&lt;LI data-start="7238" data-end="7313"&gt;Configure&amp;nbsp;&lt;STRONG data-start="7251" data-end="7263"&gt;Sampling&lt;/STRONG&gt; in Application Insights to reduce ingestion cost:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang=""&gt;services.Configure&amp;lt;TelemetryConfiguration&amp;gt;(config =&amp;gt; { config.DefaultTelemetrySink.TelemetryProcessorChainBuilder .UseSampling(20.0); // 20% sample });&lt;/LI-CODE&gt;
&lt;P&gt;2.Tune dashboards in&amp;nbsp;&lt;STRONG data-start="7516" data-end="7535"&gt;Azure Workbooks&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;Set&lt;STRONG data-start="7545" data-end="7580"&gt;retention and archival policies&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Remove unused resources to reduce cost&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2 data-start="7633" data-end="7675"&gt;&lt;STRONG data-start="7636" data-end="7675"&gt;Phase 8 – PHI Compliance &amp;amp; Security&lt;/STRONG&gt;&lt;/H2&gt;
&lt;UL data-start="7677" data-end="7852"&gt;
&lt;LI data-start="7677" data-end="7722"&gt;Avoid logging sensitive PHI in plain text&lt;/LI&gt;
&lt;LI data-start="7723" data-end="7762"&gt;Use&amp;nbsp;&lt;STRONG data-start="7729" data-end="7748"&gt;Azure Key Vault&lt;/STRONG&gt;&amp;nbsp;for secrets&lt;/LI&gt;
&lt;LI data-start="7763" data-end="7809"&gt;Enforce&amp;nbsp;&lt;STRONG data-start="7773" data-end="7781"&gt;RBAC&lt;/STRONG&gt;&amp;nbsp;for dashboards and alerts&lt;/LI&gt;
&lt;LI data-start="7810" data-end="7852"&gt;Enable&amp;nbsp;&lt;STRONG data-start="7819" data-end="7841"&gt;encryption at rest&lt;/STRONG&gt; for logs&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&lt;STRONG&gt;Phase 9 – Developer Checklist&lt;/STRONG&gt;&lt;/H3&gt;
&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;colgroup&gt;&lt;col style="width: 25.0193%" /&gt;&lt;col style="width: 25.0193%" /&gt;&lt;col style="width: 25.0193%" /&gt;&lt;col style="width: 25.0193%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Phase&lt;/th&gt;&lt;th&gt;Task&lt;/th&gt;&lt;th&gt;Developer Action&lt;/th&gt;&lt;th&gt;Status&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;Inventory&lt;/td&gt;&lt;td&gt;Document log files, forwarders, dashboards&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;Azure Prep&lt;/td&gt;&lt;td&gt;Create App Insights &amp;amp; Log Analytics&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt;File-based ingestion&lt;/td&gt;&lt;td&gt;Install AMA, configure custom logs&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;Alerts&lt;/td&gt;&lt;td&gt;Map and create alerts in Azure Monitor&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;SDK integration&lt;/td&gt;&lt;td&gt;Add AI SDK and log4net appender&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;Validation&lt;/td&gt;&lt;td&gt;Compare Splunk vs Azure logs&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;Cutover&lt;/td&gt;&lt;td&gt;Stop Splunk forwarders, archive logs&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7&lt;/td&gt;&lt;td&gt;Optimization&lt;/td&gt;&lt;td&gt;Sampling, retention, dashboards&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8&lt;/td&gt;&lt;td&gt;Security&lt;/td&gt;&lt;td&gt;Ensure PHI compliance&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;H2 data-start="8528" data-end="8559"&gt;&lt;STRONG data-start="8531" data-end="8559"&gt;Phase 10 – Key Takeaways&lt;/STRONG&gt;&lt;/H2&gt;
&lt;UL data-start="8561" data-end="8902"&gt;
&lt;LI data-start="8561" data-end="8637"&gt;Decoupled architecture (log4net → file → Splunk) makes migration simpler&lt;/LI&gt;
&lt;LI data-start="8638" data-end="8717"&gt;&lt;STRONG data-start="8640" data-end="8652"&gt;Phase 1:&lt;/STRONG&gt;&amp;nbsp;File-based ingestion → minimal code changes, immediate results&lt;/LI&gt;
&lt;LI data-start="8718" data-end="8806"&gt;&lt;STRONG data-start="8720" data-end="8732"&gt;Phase 2:&lt;/STRONG&gt;&amp;nbsp;SDK instrumentation → full observability (traces, metrics, correlation)&lt;/LI&gt;
&lt;LI data-start="8807" data-end="8850"&gt;Dual logging is critical for validation&lt;/LI&gt;
&lt;LI data-start="8851" data-end="8902"&gt;PHI compliance and alert parity must be ensured&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG data-start="8906" data-end="8930"&gt;Final Developer Tip:&lt;/STRONG&gt; Start small with a single service or VM, validate logs and alerts, then scale up to all applications.&lt;/P&gt;
&lt;/DIV&gt;</description>
      <pubDate>Wed, 29 Apr 2026 11:35:03 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-infrastructure-blog/migrating-splunk-logs-to-azure-application-insights-on-vms/ba-p/4515866</guid>
      <dc:creator>skundapura</dc:creator>
      <dc:date>2026-04-29T11:35:03Z</dc:date>
    </item>
  </channel>
</rss>

