<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>Azure topics</title>
    <link>https://techcommunity.microsoft.com/t5/azure/bd-p/Azure</link>
    <description>Azure topics</description>
    <pubDate>Sun, 03 May 2026 19:28:45 GMT</pubDate>
    <dc:creator>Azure</dc:creator>
    <dc:date>2026-05-03T19:28:45Z</dc:date>
    <item>
      <title>Cloud-Native vs. Hybrid for the 2026 Workplace</title>
      <link>https://techcommunity.microsoft.com/t5/azure/cloud-native-vs-hybrid-for-the-2026-workplace/m-p/4516460#M22524</link>
      <description>&lt;P&gt;&lt;STRONG&gt;When to choose Cloud-Native vs. Hybrid for the 2026 Workplace?&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am starting a discussion on the foundational phase of one&amp;nbsp;project. As a Computer Engineer, I believe the most critical decision we face in 2026 is determining exactly when to step to a Full Cloud model versus maintaining a Hybrid Infrastructure.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my view, the decision is not about cost, it is about resiliency, high availability and more avalability. I would like to exchange views with other engineers on these area: latency, edge requirements, integration and aglility.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In your experience, what are the Tipps that makes you choose one over the other for a 2026 environment?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm looking for technical architectural insights, not sales approaches.&lt;/P&gt;</description>
      <pubDate>Fri, 01 May 2026 14:54:03 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/cloud-native-vs-hybrid-for-the-2026-workplace/m-p/4516460#M22524</guid>
      <dc:creator>Gaaleh-Mem</dc:creator>
      <dc:date>2026-05-01T14:54:03Z</dc:date>
    </item>
    <item>
      <title>Azure Automation Hybrid Runbook Worker Supported OS</title>
      <link>https://techcommunity.microsoft.com/t5/azure/azure-automation-hybrid-runbook-worker-supported-os/m-p/4516128#M22519</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;we are currently in the process of updating or environment to Server 2025. Since the mainstream support of Server 2022 ends October this year, we would also like to update our on-premise Azure Automation Hybrid Runbook Worker from 2022 to 2025.&lt;/P&gt;&lt;P&gt;As far as I can see from the &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/automation/extension-based-hybrid-runbook-worker-install?tabs=windows%2Cps#supported-operating-systems" target="_blank"&gt;documentation&lt;/A&gt;, OS is only supported up to Server 2022, but not Server 2025. Since the mainstream support end is closing in, is there any information on official support for Server 2025 for Azure Automation HRWs? Do you already have one successfully running with Server 2025?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 30 Apr 2026 08:51:30 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/azure-automation-hybrid-runbook-worker-supported-os/m-p/4516128#M22519</guid>
      <dc:creator>PhilippZiemke</dc:creator>
      <dc:date>2026-04-30T08:51:30Z</dc:date>
    </item>
    <item>
      <title>Patterns for low-code Azure config state snapshot + recovery solution for resource groups</title>
      <link>https://techcommunity.microsoft.com/t5/azure/patterns-for-low-code-azure-config-state-snapshot-recovery/m-p/4516031#M22518</link>
      <description>&lt;P&gt;I’m looking for patterns that capture resource configuration changes over time and support best-effort recovery (redeployment) of resource config state.&lt;/P&gt;&lt;P&gt;I understand that authoritative IaC (Bicep) would be the most mature option, however, I am wondering if anyone has ever implemented a solution similar to what I have described above.&lt;/P&gt;&lt;P&gt;Ideally this would be a low-code, Azure native solution.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Apr 2026 02:08:17 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/patterns-for-low-code-azure-config-state-snapshot-recovery/m-p/4516031#M22518</guid>
      <dc:creator>nicksal</dc:creator>
      <dc:date>2026-04-30T02:08:17Z</dc:date>
    </item>
    <item>
      <title>Using Github Copilot from Azure Subscription</title>
      <link>https://techcommunity.microsoft.com/t5/azure/using-github-copilot-from-azure-subscription/m-p/4515847#M22514</link>
      <description>&lt;P&gt;Hello,&lt;BR /&gt;I have a question on how GitHub Copilot can be accessed and managed through an Azure subscription. If I am getting a Github Copilot license, how is my azure subscription getting linked to the billing and licensing?&lt;BR /&gt;Specifically, I would like clarification on how the Azure subscription is linked to GitHub Copilot billing and licensing.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Apr 2026 10:28:33 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/using-github-copilot-from-azure-subscription/m-p/4515847#M22514</guid>
      <dc:creator>MSOPS1</dc:creator>
      <dc:date>2026-04-29T10:28:33Z</dc:date>
    </item>
    <item>
      <title>MFA required for Global Admin without Conditional Access or PIM enforcement</title>
      <link>https://techcommunity.microsoft.com/t5/azure/mfa-required-for-global-admin-without-conditional-access-or-pim/m-p/4515571#M22511</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm analyzing a break-glass account scenario in Microsoft Entra ID and would like to validate a behavior I'm observing.&lt;/P&gt;&lt;P&gt;The account:&lt;/P&gt;&lt;P&gt;Has Global Administrator role (permanent assignment)&lt;BR /&gt;Is excluded from all Conditional Access policies (fully validated)&lt;BR /&gt;Is excluded from Authentication Methods policies and MFA Registration Campaign (fully validated)&lt;BR /&gt;Has no per-user MFA enabled (disabled)&lt;BR /&gt;PIM is not enforcing MFA (role is permanently active, no activation required)&lt;BR /&gt;Security Defaults are disabled&lt;BR /&gt;SSPR is not enforcing MFA&lt;/P&gt;&lt;P&gt;All configurable sources that could require MFA have been reviewed and fully ruled out.&lt;/P&gt;&lt;P&gt;However, when signing into Microsoft Admin Portals (Entra/Azure), MFA is still required and cannot be skipped.&lt;/P&gt;&lt;P&gt;In Sign-in logs:&lt;/P&gt;&lt;P&gt;Conditional Access → Not Applied&lt;BR /&gt;Authentication Details show:&lt;BR /&gt;"MFA required in Azure AD"&lt;BR /&gt;"App requires multifactor authentication"&lt;/P&gt;&lt;P&gt;Additionally, there is a Microsoft-managed policy:&lt;BR /&gt;"Multifactor authentication for admins accessing Microsoft Admin Portals"&lt;BR /&gt;but it is in Report-only mode.&lt;/P&gt;&lt;P&gt;Question:&lt;BR /&gt;Is Microsoft Entra ID enforcing MFA automatically for privileged roles (like Global Administrator) in admin portals, even when no Conditional Access or PIM policy requires it?&lt;/P&gt;&lt;P&gt;And if so, is there any supported way to fully exclude a break-glass account from this behavior?&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Apr 2026 15:06:56 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/mfa-required-for-global-admin-without-conditional-access-or-pim/m-p/4515571#M22511</guid>
      <dc:creator>schiachris</dc:creator>
      <dc:date>2026-04-28T15:06:56Z</dc:date>
    </item>
    <item>
      <title>Azure Artifact Signing: SignTool "Access is denied" with active Public Trust profile</title>
      <link>https://techcommunity.microsoft.com/t5/azure/azure-artifact-signing-signtool-quot-access-is-denied-quot-with/m-p/4514758#M22503</link>
      <description>&lt;P&gt;I’m blocked on Azure Artifact Signing for Windows EXE signing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What is already confirmed:&lt;/P&gt;&lt;P&gt;- Account endpoint: https://wus2.codesigning.azure.net/&lt;/P&gt;&lt;P&gt;- Code signing account: notarios&lt;/P&gt;&lt;P&gt;- Certificate profile: notarios-public-trust (Public Trust, Active)&lt;/P&gt;&lt;P&gt;- Identity validation: Completed&lt;/P&gt;&lt;P&gt;- User object id: 9aa27294-c04d-4aab-a7b2-3a8b10be96f9&lt;/P&gt;&lt;P&gt;- RBAC includes:&lt;/P&gt;&lt;P&gt;- Artifact Signing Identity Verifier&lt;/P&gt;&lt;P&gt;- Artifact Signing Certificate Profile Signer&lt;/P&gt;&lt;P&gt;(also assigned at certificate profile scope)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Signing command (signtool 10.0.26100.0 x64 + dlib):&lt;/P&gt;&lt;P&gt;... sign /v /debug /fd SHA256 /tr http://timestamp.acs.microsoft.com /td SHA256 /dlib "&amp;lt;...&amp;gt;\\Azure.CodeSigning.Dlib.dll" /dmdf "C:\temp\metadata-corr.json" "C:\temp\notarial-app-test.exe"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Error every time:&lt;/P&gt;&lt;P&gt;- SignTool Error: Access is denied.&lt;/P&gt;&lt;P&gt;- Number of files successfully Signed: 0&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also tested Azure CLI auth and explicit AccessToken in metadata; same result.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CorrelationId for troubleshooting:&lt;/P&gt;&lt;P&gt;- notarios-20260425-1859&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If anyone from Microsoft can check backend logs for that CorrelationId, I’d appreciate the exact reason and remediation.&lt;/P&gt;</description>
      <pubDate>Sat, 25 Apr 2026 23:21:29 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/azure-artifact-signing-signtool-quot-access-is-denied-quot-with/m-p/4514758#M22503</guid>
      <dc:creator>samuelRiosLazo</dc:creator>
      <dc:date>2026-04-25T23:21:29Z</dc:date>
    </item>
    <item>
      <title>Azure RBAC Custom Role Best Practices or Common Build Patterns</title>
      <link>https://techcommunity.microsoft.com/t5/azure/azure-rbac-custom-role-best-practices-or-common-build-patterns/m-p/4513098#M22496</link>
      <description>&lt;P&gt;As a platform admin, I want to grant application admins Contributor access while removing their ability to write or delete most Microsoft.Network resource types, with a few exceptions such as Private Endpoints, Network Interfaces, and Application Gateways.&lt;/P&gt;&lt;P&gt;Based on the effective control plane permissions logic, we designed two custom roles. The first role is a duplicate of the Contributor role, but with Microsoft.Network//Write and Microsoft.Network//Delete added to notActions. The second role adds back specific Microsoft.Network operations using wildcarded resource types, such as Microsoft.Network/networkInterfaces/*.&lt;/P&gt;&lt;P&gt;Application Admin Effective Permissions = Role 1 (Contributor - Microsoft.Network) + Role 2 (for example, Microsoft.Network/networkInterfaces/, Microsoft.Network/networkSecurityGroups/, Microsoft.Network/applicationGateways/write, etc.)&lt;/P&gt;&lt;P&gt;I understand that Microsoft RBAC best practices recommend avoiding wildcard (*) operations. However, my team has found that building roles with individual operations is extremely tedious and time-consuming, especially when trying to understand the impact of each operation.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Does anyone have suggestions for a simpler or more maintainable pattern for implementing this type of custom RBAC design?&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 18:40:54 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/azure-rbac-custom-role-best-practices-or-common-build-patterns/m-p/4513098#M22496</guid>
      <dc:creator>nicksal</dc:creator>
      <dc:date>2026-04-20T18:40:54Z</dc:date>
    </item>
    <item>
      <title>Legacy SSRS reports after upgrading Azure DevOps Server 2020 to 2022 or 25H2</title>
      <link>https://techcommunity.microsoft.com/t5/azure/legacy-ssrs-reports-after-upgrading-azure-devops-server-2020-to/m-p/4512555#M22494</link>
      <description>&lt;P&gt;We are currently planning an upgrade from Azure DevOps Server 2020 to Azure DevOps Server 2022 or 25H2, and one of our biggest concerns is reporting.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We understand that Microsoft’s recommended direction is to move to Power BI based on Analytics / OData. However, for on-prem environments with a large number of existing SSRS reports, rebuilding everything from scratch would require significant time and effort.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since Warehouse and Analysis Services are no longer available in newer versions, we would like to understand how other on-prem teams are handling legacy SSRS reporting during and after the upgrade.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have you rebuilt your reports in Power BI, moved to another reporting approach, or found a practical way to keep existing SSRS reports available during the transition?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any real-world experience, lessons learned, or recommended approaches would be greatly appreciated.&lt;/P&gt;</description>
      <pubDate>Sat, 18 Apr 2026 04:24:17 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/legacy-ssrs-reports-after-upgrading-azure-devops-server-2020-to/m-p/4512555#M22494</guid>
      <dc:creator>fujiwaraH2O</dc:creator>
      <dc:date>2026-04-18T04:24:17Z</dc:date>
    </item>
    <item>
      <title>Excluding break-glass account from MFA Registration Campaign – impact on existing users?</title>
      <link>https://techcommunity.microsoft.com/t5/azure/excluding-break-glass-account-from-mfa-registration-campaign/m-p/4512070#M22492</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm currently reviewing the configuration of a break-glass (emergency access) account in Microsoft Entra ID and I have a question regarding MFA registration enforcement.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We currently have an Authentication Methods Registration Campaign enabled for all users for quite some time. We identified that the break-glass account is being required to register MFA due to this configuration.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The account is already excluded from all Conditional Access policies that enforce MFA, so the behavior appears to be specifically coming from the registration campaign (Microsoft Authenticator requirement).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Our goal is to exclude this break-glass account from the MFA registration requirement, following Microsoft best practices.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My question is:&lt;/P&gt;&lt;P&gt;If we edit the existing registration campaign and add an exclusion (user or group), could this have any impact on users who are already registered?&lt;/P&gt;&lt;P&gt;Specifically, could it re-trigger the registration process or affect existing MFA configurations?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We want to avoid any unintended impact, considering this campaign has been in place for a long time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Has anyone implemented a similar exclusion for break-glass accounts within an active registration campaign? Any insights or confirmation would be really helpful.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2026 14:03:10 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/excluding-break-glass-account-from-mfa-registration-campaign/m-p/4512070#M22492</guid>
      <dc:creator>schiachris</dc:creator>
      <dc:date>2026-04-16T14:03:10Z</dc:date>
    </item>
    <item>
      <title>Running Commands Across VM Scale Set Instances Without RDP/SSH Using Azure CLI Run Command</title>
      <link>https://techcommunity.microsoft.com/t5/azure/running-commands-across-vm-scale-set-instances-without-rdp-ssh/m-p/4511577#M22490</link>
      <description>&lt;P&gt;If you’ve ever managed an Azure Virtual Machine Scale Set (VMSS), you’ve likely run into this situation:&lt;/P&gt;
&lt;P&gt;You need to validate something across all nodes, such as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Checking a configuration value&lt;/LI&gt;
&lt;LI&gt;Retrieving logs&lt;/LI&gt;
&lt;LI&gt;Applying a registry change&lt;/LI&gt;
&lt;LI&gt;Confirming runtime settings&lt;/LI&gt;
&lt;LI&gt;Running a quick diagnostic command&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And then you realize:&lt;/P&gt;
&lt;P&gt;You’re not dealing with two or three machines you’re dealing with 40… 80… or even hundreds of instances.&lt;/P&gt;
&lt;H3&gt;The Traditional Approach (and Its Limitations)&lt;/H3&gt;
&lt;P&gt;Historically, administrators would:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Open RDP connections to Windows nodes&lt;/LI&gt;
&lt;LI&gt;SSH into Linux nodes&lt;/LI&gt;
&lt;LI&gt;Execute commands manually on each instance&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;While this may work for a small number of machines, in real‑world environments such as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Batch (user‑managed pools)&lt;/LI&gt;
&lt;LI&gt;Azure Service Fabric (classic clusters)&lt;/LI&gt;
&lt;LI&gt;VMSS‑based application tiers&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This approach quickly becomes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Operationally inefficient&lt;/LI&gt;
&lt;LI&gt;Time‑consuming&lt;/LI&gt;
&lt;LI&gt;Sometimes impossible&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Especially when:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;RDP or SSH ports are blocked&lt;/LI&gt;
&lt;LI&gt;Network Security Groups restrict inbound connectivity&lt;/LI&gt;
&lt;LI&gt;Administrative credentials are unavailable&lt;/LI&gt;
&lt;LI&gt;Network configuration issues prevent guest access&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Azure Run Command&lt;/H3&gt;
&lt;P&gt;To address this, Azure provides a built‑in capability to execute commands inside virtual machines through the Azure control plane, without requiring direct guest OS connectivity. This feature is called &lt;STRONG&gt;Run Command&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;You can review the official documentation here:&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/linux/run-command" target="_blank"&gt;Run scripts in a Linux VM in Azure using action Run Commands - Azure Virtual Machines | Microsoft Learn&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/windows/run-command?tabs=portal%2Cpowershellremove" target="_blank"&gt;Run scripts in a Windows VM in Azure using action Run Commands - Azure Virtual Machines | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Run Command&lt;/STRONG&gt; uses the Azure VM Agent installed on the virtual machine to execute PowerShell or shell scripts directly inside the guest OS.&lt;/P&gt;
&lt;P&gt;Because execution happens via the Azure control plane, you can run commands even when:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;RDP or SSH ports are blocked&lt;/LI&gt;
&lt;LI&gt;NSGs restrict inbound access&lt;/LI&gt;
&lt;LI&gt;Administrative user configuration is broken&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;In fact, Run Command is specifically designed to troubleshoot and remediate virtual machines that cannot be accessed through standard remote access methods.&lt;/P&gt;
&lt;H3&gt;Prerequisites &amp;amp; Restrictions.&lt;/H3&gt;
&lt;P&gt;Before using Run Command, ensure the following:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;VM Agent installed and in Ready state&lt;/LI&gt;
&lt;LI&gt;Outbound connectivity from the VM to Azure public IPs over TCP 443 to return execution results.&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;If outbound connectivity is blocked, scripts may run successfully but no output will be returned to the caller.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Additional limitations include:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Output limited to the last 4,096 bytes&lt;/LI&gt;
&lt;LI&gt;One script execution at a time per VM&lt;/LI&gt;
&lt;LI&gt;Interactive scripts are not supported&lt;/LI&gt;
&lt;LI&gt;Maximum execution time of 90 minutes&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Full list of restrictions and limitations are available here:&lt;BR /&gt;&lt;A id="lia-url-1776188919871" class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/virtual-machines/windows/run-command?tabs=portal%2Cpowershellremove#restrictions" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/virtual-machines/windows/run-command?tabs=portal%2Cpowershellremove#restrictions&lt;/A&gt;&lt;/P&gt;
&lt;H3&gt;Required Permissions (RBAC)&lt;/H3&gt;
&lt;P&gt;Executing Run Command requires appropriate Azure RBAC permissions.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-border-color-21 lia-border-style-solid" border="1" style="width: 100%; height: 125px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr style="height: 35px;"&gt;&lt;td class="lia-indent-padding-left-210px lia-border-color-21" style="height: 35px;"&gt;&lt;SPAN class="lia-text-color-21"&gt;Action&lt;/SPAN&gt;&lt;/td&gt;&lt;td class="lia-indent-padding-left-210px lia-border-color-21" style="height: 35px;"&gt;&lt;SPAN class="lia-text-color-21"&gt;Permission&lt;/SPAN&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 45px;"&gt;&lt;td class="lia-border-color-21" style="height: 45px;"&gt;
&lt;P&gt;List available Run Commands&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21" style="height: 45px;"&gt;
&lt;P&gt;Microsoft.Compute/locations/runCommands/read&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 45px;"&gt;&lt;td class="lia-border-color-21" style="height: 45px;"&gt;
&lt;P&gt;Execute Run Command&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-border-color-21" style="height: 45px;"&gt;
&lt;P&gt;Microsoft.Compute/virtualMachines/runCommand/action&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;The execution permission is included in:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Virtual Machine Contributor role (or higher)&lt;/P&gt;
&lt;P&gt;Users without this permission will be unable to execute remote scripts through Run Command.&lt;/P&gt;
&lt;H3&gt;Azure CLI: az vm vs az vmss&lt;/H3&gt;
&lt;P&gt;When using Azure CLI, you’ll encounter two similar‑looking commands that behave very differently.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;az vm run-command invoke&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Used for standalone VMs&lt;/LI&gt;
&lt;LI&gt;Also used for Flexible VM Scale Sets&lt;/LI&gt;
&lt;LI&gt;Targets VMs by name&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;az vmss run-command invoke&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Used only for Uniform VM Scale Sets&lt;/LI&gt;
&lt;LI&gt;Targets instances by numeric instanceId (0, 1, 2, …)&lt;/LI&gt;
&lt;/UL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Example:&lt;STRONG&gt; &lt;/STRONG&gt;az vmss run-command invoke --instance-id &amp;lt;id&amp;gt;&lt;/P&gt;
&lt;P&gt;Unlike standalone VM execution, VMSS instances must be referenced using the parameter "--instance-id" to identify which scale set instance will run the script.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H3&gt;Important: Uniform vs Flexible VM Scale Sets&lt;/H3&gt;
&lt;P&gt;This distinction is critical when automating Run Command execution.&lt;/P&gt;
&lt;H5&gt;Uniform VM Scale Sets&lt;/H5&gt;
&lt;UL&gt;
&lt;LI&gt;Instances are managed as identical replicas&lt;/LI&gt;
&lt;LI&gt;Each instance has a numeric instanceId&lt;/LI&gt;
&lt;LI&gt;Supported by az vmss run-command invoke&lt;/LI&gt;
&lt;/UL&gt;
&lt;H5&gt;Flexible VM Scale Sets&lt;/H5&gt;
&lt;UL&gt;
&lt;LI&gt;Each instance is a first‑class Azure VM resource&lt;/LI&gt;
&lt;LI&gt;Instance identifiers are VM names, not numbers&lt;/LI&gt;
&lt;LI&gt;az vmss run-command invoke is not supported&lt;/LI&gt;
&lt;LI&gt;Must use az vm run-command invoke per VM&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To determine which orchestration mode your VMSS uses:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;az vmss show -g "${RG}" -n "${VMSS}" --query "orchestrationMode" -o tsv&lt;/LI-CODE&gt;
&lt;H3&gt;Windows vs Linux Targets&lt;/H3&gt;
&lt;P&gt;Choose the appropriate command ID based on the guest OS:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Windows VMs → RunPowerShellScript&lt;/LI&gt;
&lt;LI&gt;Linux VMs → RunShellScript&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Example Scenario - Retrieve Hostname From All VMSS Instances&lt;/H3&gt;
&lt;P&gt;The following examples demonstrate how to retrieve the hostname from all VMSS instances using Azure CLI and Bash.&lt;/P&gt;
&lt;H5&gt;Flexible VMSS, Bash (Azure CLI)&lt;/H5&gt;
&lt;LI-CODE lang="bash"&gt;RG="&amp;lt;ResourceGroup&amp;gt;"
VMSS="&amp;lt;VMSSName&amp;gt;"
SUBSCRIPTION_ID="&amp;lt;SubscriptionID&amp;gt;"

az account set --subscription "${SUBSCRIPTION_ID}"

VM_NAMES=$(az vmss list-instances \
  -g "${RG}" \
  -n "${VMSS}" \
  --query "[].name" \
  -o tsv)

for VM in $VM_NAMES; do
  echo "Running on VM: $VM"

  az vm run-command invoke \
    -g "${RG}" \
    -n "$VM" \
    --command-id RunShellScript \
    --scripts "hostname" \
    --query "value[0].message" \
    -o tsv
done
&lt;/LI-CODE&gt;
&lt;H5&gt;Uniform VMSS, Bash (Azure CLI)&lt;/H5&gt;
&lt;LI-CODE lang="bash"&gt;RG="&amp;lt;ResourceGroup&amp;gt;"
VMSS="&amp;lt;VMSSName&amp;gt;"
SUBSCRIPTION_ID="&amp;lt;SubscriptionID&amp;gt;"

az account set --subscription "${SUBSCRIPTION_ID}"

INSTANCE_IDS=$(az vmss list-instances -g "${RG}" -n "${VMSS}" --query "[].instanceId" -o tsv)

for ID in $INSTANCE_IDS; do
  echo "Running on instanceId: $ID"

  az vmss run-command invoke \
    -g "${RG}" \
    -n "${VMSS}" \
    --instance-id "$ID" \
    --command-id RunShellScript \
    --scripts "hostname" \
    --query "value[0].message" \
    -o tsv
done
&lt;/LI-CODE&gt;
&lt;H3&gt;Summary&lt;/H3&gt;
&lt;P&gt;Azure Run Command provides a scalable method to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Execute diagnostics&lt;/LI&gt;
&lt;LI&gt;Apply configuration changes&lt;/LI&gt;
&lt;LI&gt;Collect logs&lt;/LI&gt;
&lt;LI&gt;Validate runtime settings&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;…across VMSS instances without requiring RDP or SSH connectivity.&lt;/P&gt;
&lt;P&gt;This significantly simplifies operational workflows in large‑scale compute environments such as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Azure Batch (user‑managed pools)&lt;/LI&gt;
&lt;LI&gt;Azure Service Fabric classic clusters&lt;/LI&gt;
&lt;LI&gt;VMSS‑based application tiers&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2026 11:46:24 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/running-commands-across-vm-scale-set-instances-without-rdp-ssh/m-p/4511577#M22490</guid>
      <dc:creator>vdivizinschi</dc:creator>
      <dc:date>2026-04-15T11:46:24Z</dc:date>
    </item>
    <item>
      <title>Excited to share my latest open-source project: KubeCost Guardian</title>
      <link>https://techcommunity.microsoft.com/t5/azure/excited-to-share-my-latest-open-source-project-kubecost-guardian/m-p/4510315#M22489</link>
      <description>&lt;P&gt;After seeing how many DevOps teams struggle with Kubernetes cost visibility on Azure, I built a full-stack cost optimization platform from scratch.&lt;BR /&gt;&lt;BR /&gt;𝗪𝗵𝗮𝘁 𝗶𝘁 𝗱𝗼𝗲𝘀:&lt;BR /&gt;✅ Real-time AKS cluster monitoring via Azure SDK&lt;BR /&gt;✅ Cost breakdown per namespace, node, and pod&lt;BR /&gt;✅ AI-powered recommendations generated from actual cluster state&lt;BR /&gt;✅ One-click optimization actions&lt;BR /&gt;✅ JWT-secured dashboard with full REST API&lt;BR /&gt;&lt;BR /&gt;𝗧𝗲𝗰𝗵 𝗦𝘁𝗮𝗰𝗸:&lt;BR /&gt;- React 18 + TypeScript + Vite&lt;BR /&gt;- Tailwind CSS + shadcn/ui + Recharts&lt;BR /&gt;- Node.js + Express + TypeScript&lt;BR /&gt;- Azure SDK (@azure/arm-containerservice)&lt;BR /&gt;- JWT Authentication + Azure Service Principal&lt;BR /&gt;&lt;BR /&gt;𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝗶𝘁 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁:&lt;BR /&gt;Most cost tools show you generic estimates. KubeCost Guardian reads your actual VM size, node count, and cluster configuration to generate recommendations that are specific to your infrastructure not averages.&lt;BR /&gt;For example, if your cluster has only 2 nodes with no autoscaler enabled, it immediately flags the HA risk and calculates exactly how much you'd save by switching to Spot instances based on your actual VM size.&lt;BR /&gt;&lt;BR /&gt;This project is fully open-source and built for the DevOps community.&lt;BR /&gt;&lt;BR /&gt;⭐ GitHub: https://github.com/HlaliMedAmine/kubecost-guardian&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;This project represents hours of hard work, and passion.&lt;BR /&gt;&lt;BR /&gt;I decided to make it open-source so everyone can benefit from it 🤝 ,If you find it useful, I’d really appreciate your support .&lt;BR /&gt;&lt;BR /&gt;Your support motivates me to keep building and sharing more powerful projects 👌.&lt;BR /&gt;&lt;BR /&gt;More exciting ideas are coming soon… stay tuned! 🔥.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 15:16:04 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/excited-to-share-my-latest-open-source-project-kubecost-guardian/m-p/4510315#M22489</guid>
      <dc:creator>Hlali_Mohamed_Amine</dc:creator>
      <dc:date>2026-04-10T15:16:04Z</dc:date>
    </item>
    <item>
      <title>Pipeline Intelligence is live and open-source real-time Azure DevOps monitoring powered by AI .</title>
      <link>https://techcommunity.microsoft.com/t5/azure/pipeline-intelligence-is-live-and-open-source-real-time-azure/m-p/4510312#M22486</link>
      <description>&lt;P&gt;Every DevOps team I've worked with had the same problem: Slow pipelines. Zero visibility. No idea where to start. So I stopped complaining and built the solution.&lt;BR /&gt;&lt;BR /&gt;So I built something about it.&lt;BR /&gt;&lt;BR /&gt;⚡ Pipeline Intelligence is a full-stack Azure DevOps monitoring dashboard that:&lt;BR /&gt;&lt;BR /&gt;✅ Connects to your real Azure DevOps organization via REST API&lt;BR /&gt;✅ Detects bottlenecks across all your pipelines automatically&lt;BR /&gt;✅ Calculates exactly how much time your team is wasting per month&lt;BR /&gt;✅ Uses Gemini AI to generate prioritized fixes with ready-to-paste YAML solutions&lt;BR /&gt;✅ JWT-secured, Docker-ready, and fully open-source&lt;BR /&gt;&lt;BR /&gt;Tech Stack:&lt;BR /&gt;→ React 18 + Vite + Tailwind CSS&lt;BR /&gt;→ Node.js + Express + Azure DevOps API v7&lt;BR /&gt;→ Google Gemini 1.5 Flash&lt;BR /&gt;→ JWT Authentication + Docker&lt;BR /&gt;&lt;BR /&gt;𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝗶𝘁 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁?&lt;BR /&gt;Most tools show you generic estimates.&lt;BR /&gt;Pipeline Intelligence reads your actual cluster config, node count, and pipeline structure and gives you recommendations specific to your infrastructure.&lt;BR /&gt;&lt;BR /&gt;🎯 This year, I set myself a personal challenge:&lt;BR /&gt;Build and open-source a series of production-grade tools exclusively focused on Azure services tools that solve real problems for real DevOps teams.&lt;BR /&gt;&lt;BR /&gt;This project represents weeks of research, architecture decisions, and late-night debugging sessions. I'm sharing it with the community because I believe great tooling should be accessible to everyone not locked behind enterprise paywalls.&lt;BR /&gt;&lt;BR /&gt;If this resonates with you, I have one simple ask:&lt;BR /&gt;👉 A like, a comment, or a share takes 3 seconds but it helps this reach the DevOps engineers who need it most.&lt;BR /&gt;&lt;BR /&gt;Your support&lt;/P&gt;&lt;P&gt;is what keeps me building. ❤️&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;GitHub:&amp;nbsp; https://github.com/HlaliMedAmine/pipeline-intelligence&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 15:04:34 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/pipeline-intelligence-is-live-and-open-source-real-time-azure/m-p/4510312#M22486</guid>
      <dc:creator>Hlali_Mohamed_Amine</dc:creator>
      <dc:date>2026-04-10T15:04:34Z</dc:date>
    </item>
    <item>
      <title>Building a Production-Ready Azure Lighthouse Deployment Pipeline with EPAC</title>
      <link>https://techcommunity.microsoft.com/t5/azure/building-a-production-ready-azure-lighthouse-deployment-pipeline/m-p/4509962#M22484</link>
      <description>&lt;P&gt;Recently I worked on an interesting project for an end-to-end Azure Lighthouse implementation.&lt;BR /&gt;What really stood out to me was the combination of Azure Lighthouse, EPAC, DevOps, and workload identity federation.&lt;BR /&gt;The deployment model was so compelling that I decided to build and validate the full solution hands-on in my own personal Azure tenants.&lt;BR /&gt;The result is a detailed article that documents the entire journey, including pipeline design, implementation steps, and the scripts I prepared along the way.&lt;BR /&gt;You can read the full article &lt;A class="lia-external-url" href="https://vakhsha.com/blog/blog-16.html" target="_blank"&gt;here&amp;nbsp;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2026 11:24:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/building-a-production-ready-azure-lighthouse-deployment-pipeline/m-p/4509962#M22484</guid>
      <dc:creator>omidvahedv</dc:creator>
      <dc:date>2026-04-09T11:24:00Z</dc:date>
    </item>
    <item>
      <title>Azure Key Vault Replication: Why Paired Regions Alone Don’t Guarantee Business Continuity</title>
      <link>https://techcommunity.microsoft.com/t5/azure/azure-key-vault-replication-why-paired-regions-alone-don-t/m-p/4508945#M22479</link>
      <description>&lt;P&gt;As customers modernize toward multi‑region architectures in Azure, one question comes up repeatedly:&lt;/P&gt;
&lt;P&gt;“If my region goes down, will Azure Key Vault continue to work without disruption?”&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The short answer:&amp;nbsp; &lt;STRONG&gt;&lt;EM&gt;it depends on what you mean by “work.”&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Azure Key Vault provides strong durability and availability guarantees, but those guarantees are often misunderstood—especially when customers assume paired‑region replication equals full disaster recovery. In reality, &lt;STRONG&gt;Azure Key Vault replication is designed for survivability&lt;/STRONG&gt;, not uninterrupted write access or customer‑controlled failover.&lt;/P&gt;
&lt;P&gt;This post explains:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;How Azure Key Vault replication actually works (per Microsoft Learn)&lt;/LI&gt;
&lt;LI&gt;Why paired‑region failover does not equal business continuity&lt;/LI&gt;
&lt;LI&gt;Two reference architectures that implement &lt;STRONG&gt;true multi‑region Key Vault availability&lt;/STRONG&gt;, with Terraform&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;How Azure Key Vault Replication Works (Per Microsoft Learn)&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Azure Key Vault includes multiple layers of Microsoft‑managed redundancy.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;In‑Region and Zone Resiliency&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Vault contents are replicated within the region.&lt;/LI&gt;
&lt;LI&gt;In regions that support availability zones, Key Vault is zone‑resilient by default.&lt;/LI&gt;
&lt;LI&gt;This protects against localized hardware or zone failures.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Paired‑Region Replication&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;If a Key Vault is deployed in a region with an Azure‑defined paired region, its contents are asynchronously replicated to that paired region.&lt;/LI&gt;
&lt;LI&gt;This replication is automatic and cannot be configured, observed, or tested by customers.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Microsoft‑Managed Regional Failover&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;If Microsoft declares a full regional outage, requests are automatically routed to the paired region.&lt;/LI&gt;
&lt;LI&gt;After failover, the vault operates in read‑only mode:&lt;/LI&gt;
&lt;UL&gt;
&lt;LI&gt;✅ Read secrets, keys, and certificates&lt;/LI&gt;
&lt;LI&gt;✅ Perform cryptographic operations&lt;/LI&gt;
&lt;LI&gt;❌ Create, update, rotate, or delete secrets, keys, or certificates&lt;/LI&gt;
&lt;/UL&gt;
&lt;/UL&gt;
&lt;P&gt;This is a critical distinction.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Paired‑region replication preserves access — not operational continuity.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why Paired‑Region Replication Is Not Business Continuity&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;From a reliability and DR perspective, several limitations matter:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Failover is Microsoft‑initiated, not customer‑controlled&lt;/LI&gt;
&lt;LI&gt;No write operations during regional failover&lt;/LI&gt;
&lt;LI&gt;No secret rotation or certificate renewal&lt;/LI&gt;
&lt;LI&gt;No way to test DR&lt;/LI&gt;
&lt;LI&gt;Accidental deletions replicate&lt;/LI&gt;
&lt;LI&gt;No point‑in‑time recovery without backups&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Microsoft Learn explicitly states that critical workloads may require custom multi‑region strategies beyond built‑in replication.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; For many customers, this means Azure Key Vault becomes a single‑region dependency in an otherwise multi‑region application design.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The Multi‑Region Key Vault Pattern&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The two GitHub repositories below implement a common architectural shift:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Multiple independent Key Vaults deployed in separate regions, &lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; with customer‑controlled replication and failover.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Instead of relying on invisible platform replication, the vaults become first‑class, region‑scoped resources, aligned with application failover.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Solution 1: Private, Locked‑Down Multi‑Region Key Vault Replication&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Repository:&lt;/P&gt;
&lt;P&gt;👉 &lt;A href="https://github.com/jclem2000/KeyVault-MultiRegion-Replication-Private" target="_blank" rel="noopener"&gt;https://github.com/jclem2000/KeyVault-MultiRegion-Replication-Private&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Architecture Highlights&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Independent Key Vault per region&lt;/LI&gt;
&lt;LI&gt;Private Endpoints only&lt;/LI&gt;
&lt;LI&gt;No public network exposure&lt;/LI&gt;
&lt;LI&gt;Terraform‑based deployment&lt;/LI&gt;
&lt;LI&gt;Controlled replication using Event Based synchronization&lt;/LI&gt;
&lt;/UL&gt;
&lt;img&gt;Private Multi-Region Key Vault Replication&lt;/img&gt;
&lt;P&gt;&lt;STRONG&gt;What This Enables&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;✅ Full read/write access during regional outages&lt;/LI&gt;
&lt;LI&gt;✅ Continued secret rotation and certificate renewal&lt;/LI&gt;
&lt;LI&gt;✅ Customer‑defined failover and RTO&lt;/LI&gt;
&lt;LI&gt;✅ DR testing and validation&lt;/LI&gt;
&lt;LI&gt;✅ Strong alignment with zero‑trust and regulated environments&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Trade‑offs&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Higher operational complexity&lt;/LI&gt;
&lt;LI&gt;Requires automation and application awareness of multiple vaults&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Solution 2: Low‑Cost Public Multi‑Region Key Vault Replication&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Repository:&lt;/P&gt;
&lt;P&gt;👉 &lt;A href="https://github.com/jclem2000/KeyVault-MultiRegion-Replication-Public" target="_blank" rel="noopener"&gt;https://github.com/jclem2000/KeyVault-MultiRegion-Replication-Public&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Architecture Highlights&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Independent Key Vault per region&lt;/LI&gt;
&lt;LI&gt;Public endpoints&lt;/LI&gt;
&lt;LI&gt;Minimal networking dependencies&lt;/LI&gt;
&lt;LI&gt;Terraform‑based&lt;/LI&gt;
&lt;LI&gt;Controlled replication using Event Based synchronization&lt;/LI&gt;
&lt;LI&gt;Optimized for simplicity and cost&lt;/LI&gt;
&lt;/UL&gt;
&lt;img&gt;Low-Cost Multi-Region Key Vault Replication&lt;/img&gt;
&lt;P&gt;&lt;STRONG&gt;What This Enables&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;✅ Full read/write availability in any region&lt;/LI&gt;
&lt;LI&gt;✅ Clear and testable DR posture&lt;/LI&gt;
&lt;LI&gt;✅ Lower cost than private endpoint designs&lt;/LI&gt;
&lt;LI&gt;✅ Suitable for many non‑regulated workloads&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Trade‑offs&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Public exposure (mitigated via firewall rules, RBAC, and conditional access)&lt;/LI&gt;
&lt;LI&gt;Not appropriate for all compliance requirements&lt;/LI&gt;
&lt;LI&gt;Requires automation and application awareness of multiple vaults&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Azure Native Replication vs Customer‑Managed Multi‑Region Vaults&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Capability&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Azure Paired Region&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Multi‑Region Vaults&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Read access during outage&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;✅&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;✅&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Write access during outage&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;❌&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;✅&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Secret rotation during outage&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;❌&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;✅&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Customer‑controlled failover&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;❌&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;✅&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;DR testing&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;❌&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;✅&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Isolation from accidental deletion&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;❌&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;✅&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Predictable RTO&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;❌&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;✅&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Azure Key Vault’s native replication optimizes for &lt;STRONG&gt;platform durability&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;The multi‑region pattern optimizes for &lt;STRONG&gt;application continuity&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;When to Use Each Approach&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Paired‑Region Replication Is Often Enough When:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Secrets are mostly static&lt;/LI&gt;
&lt;LI&gt;Read‑only access during outages is acceptable&lt;/LI&gt;
&lt;LI&gt;RTO is flexible&lt;/LI&gt;
&lt;LI&gt;You prefer Microsoft‑managed recovery&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Multi‑Region Vaults Are Recommended When:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Secrets or certificates rotate frequently&lt;/LI&gt;
&lt;LI&gt;Applications must remain writable during outages&lt;/LI&gt;
&lt;LI&gt;Deterministic failover is required&lt;/LI&gt;
&lt;LI&gt;DR testing is mandatory&lt;/LI&gt;
&lt;LI&gt;Regulatory or operational isolation is needed&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Closing Thoughts&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Azure Key Vault behaves exactly as documented on Microsoft Learn—but it’s important to be clear about what those guarantees mean. Paired‑region replication protects your data, &lt;STRONG&gt;not your ability to operate&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;If your application is designed to survive a regional outage, Key Vault must follow the same multi‑region design principles as the application itself.&lt;/P&gt;
&lt;P&gt;The reference architectures above show how to extend Azure’s native durability model into &lt;STRONG&gt;true operational resilience&lt;/STRONG&gt;, without waiting for a platform‑level failover decision.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2026 19:45:04 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/azure-key-vault-replication-why-paired-regions-alone-don-t/m-p/4508945#M22479</guid>
      <dc:creator>joclemen</dc:creator>
      <dc:date>2026-04-06T19:45:04Z</dc:date>
    </item>
    <item>
      <title>How on god's green earth do you buy an API key? Bing custom search API.</title>
      <link>https://techcommunity.microsoft.com/t5/azure/how-on-god-s-green-earth-do-you-buy-an-api-key-bing-custom/m-p/4508661#M22478</link>
      <description>&lt;P&gt;I just want to give Microsoft money. I want to buy an API key for the Bing custom search. How do I do this? I get to Production and click "Click to issue paid tier key" and I keep getting the same god-awful&lt;/P&gt;&lt;P&gt;"""&lt;/P&gt;&lt;P&gt;Could not create the marketplace item&lt;/P&gt;&lt;P&gt;Oops!&lt;/P&gt;&lt;P&gt;Could not create the marketplace item&lt;/P&gt;&lt;P&gt;Gallery item is required, no gallery item is provided.&lt;/P&gt;&lt;P&gt;"""&lt;/P&gt;&lt;P&gt;in the Azure marketplace. I just want to spend money. How do I do that?&lt;/P&gt;</description>
      <pubDate>Sat, 04 Apr 2026 14:04:18 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/how-on-god-s-green-earth-do-you-buy-an-api-key-bing-custom/m-p/4508661#M22478</guid>
      <dc:creator>m3ntosandcoke</dc:creator>
      <dc:date>2026-04-04T14:04:18Z</dc:date>
    </item>
    <item>
      <title>The March 2026 Innovation Challenge Winners</title>
      <link>https://techcommunity.microsoft.com/t5/azure/the-march-2026-innovation-challenge-winners/m-p/4508498#M22477</link>
      <description>&lt;img /&gt;
&lt;P class="lia-clear-both"&gt;For this round of the Innovation Challenge the organizations we sponsor helped over 15,000 developers get the skills it takes to build AI solutions on Azure. This program is grounded in Microsoft’s mission and designed to enable a diverse and qualified community of professional developers coming together to tackle big problems. We helped almost 1,000 people earn Microsoft certifications and Applied Skills credentials, and 300 participated in the invitation only March 2026 Innovation Challenge hackathon. Teams represented &lt;A href="https://shpe.org/" target="_blank" rel="noopener"&gt;SHPE&lt;/A&gt;, &lt;A href="https://womenincloud.com/aichallenge/" target="_blank" rel="noopener"&gt;Women in Cloud&lt;/A&gt;, &lt;A href="https://codigofacilito.com/" target="_blank" rel="noopener"&gt;Código Facilito&lt;/A&gt;, &lt;A href="https://www.dio.me/en" target="_blank" rel="noopener"&gt;DIO&lt;/A&gt;, &lt;A href="https://genspark.net/" target="_blank" rel="noopener"&gt;GenSpark&lt;/A&gt;, &lt;A href="https://www.spaceappschallenge.org/nasa-space-apps-2024/2024-local-events/chicago-il/" target="_blank" rel="noopener"&gt;NASA Space Apps&lt;/A&gt;, &lt;A href="https://www.youtube.com/@projectbluemountainacademia" target="_blank" rel="noopener"&gt;Project Blue Mountain&lt;/A&gt;, and &lt;A href="https://techbridge.org/" target="_blank" rel="noopener"&gt;TechBridge&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Check out the winning project to meet some of the best AI talent in our community and to get inspired about what we can build together!&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;First place $10,000&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/dfig777/Cognitive-Load---SHPE-2026-Hackathon" target="_blank" rel="noopener"&gt;Pebble. - AI Cognitive Load Companion&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Pebble. is named after a worry stone: something small and smooth you reach for when the world feels like too much. It's an AI cognitive support companion that turns overwhelming documents, tasks, and information into calm, structured clarity. Built for neurodivergent minds. Useful for everyone.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Second place $5,000&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/n0va-ctrl/memory-bridge" target="_blank" rel="noopener"&gt;The Living Memory Bridge&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;We believe dementia represents the most extreme form of cognitive overload that exists. It is not just information overload. It is cognitive loss: the gradual erosion of the very tools people use to process the world. Every principle in the brief applies here in its most urgent form: simplified language, adaptive communication, calm and dignity-preserving interactions, personalized memory anchors, and support that meets people exactly where they are.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/ronaldo719/CRAM-Query-to-Insight-Analytics" target="_blank" rel="noopener"&gt;Query to Insight Analytics CRAM&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;CRAM is a natural language healthcare analytics platform built entirely on Azure that lets clinical and administrative staff query a patient database using plain English, no SQL required. Users type a question like "What are the top 10 conditions among diabetic patients?" and get back a written summary, a data table, and an auto-generated chart in seconds.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Third place $2,500&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/joannedada/ClearStep" target="_blank" rel="noopener"&gt;ClearStep&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;ClearStep is an action-first AI system designed to reduce decision overload in high-risk or confusing situations. Instead of only detecting risk, it tells users exactly what to do next. The core innovation is architectural: model output is not trusted. Every response is enforced by a validation layer that guarantees structure, corrects model errors, and prevents unsafe or misleading outputs from reaching the user.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/JordyRHLM/DataTalk_UmsaBrainstorm" target="_blank" rel="noopener"&gt;DataTalk&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Our platform enables seamless data ingestion from Excel, CSV, SharePoint, and OneDrive, processes it through a two-layer analytical pipeline powered by DuckDB, and orchestrates four specialized AI agents that work as a team: understanding intent, reading data structure, generating and self-correcting SQL, and enforcing security and auditability at every step&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/RAGulatorAPP/RAGulator" target="_blank" rel="noopener"&gt;RAGulator AI Governance Engine&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Advanced, governed, and traceable RAG (Retrieval-Augmented Generation) system for international trade. RAGulator is a 100% functional solution that unifies the Azure intelligence ecosystem to deliver grounded responses with immutable bibliographic citations.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2026 13:17:23 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/the-march-2026-innovation-challenge-winners/m-p/4508498#M22477</guid>
      <dc:creator>macalde</dc:creator>
      <dc:date>2026-04-06T13:17:23Z</dc:date>
    </item>
    <item>
      <title>Building Multi-Agent Orchestration Using Microsoft Semantic Kernel: A Complete Step-by-Step Guide</title>
      <link>https://techcommunity.microsoft.com/t5/azure/building-multi-agent-orchestration-using-microsoft-semantic/m-p/4507660#M22475</link>
      <description>&lt;H2&gt;What You Will Build&lt;/H2&gt;
&lt;P class="lia-align-justify"&gt;By the end of this guide, you will have a working multi-agent system where &lt;STRONG&gt;4 specialist AI agents&lt;/STRONG&gt; collaborate to diagnose production issues:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;STRONG&gt;ClientAnalyst&lt;/STRONG&gt; — Analyzes browser, JavaScript, CORS, uploads, and UI symptoms&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;NetworkAnalyst&lt;/STRONG&gt; — Analyzes DNS, TCP/IP, TLS, load balancers, and firewalls&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;ServerAnalyst&lt;/STRONG&gt; — Analyzes backend logs, database, deployments, and resource limits&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Coordinator&lt;/STRONG&gt; — Synthesizes all findings into a root cause report with a prioritized action plan&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;These agents don't just run in sequence — they &lt;STRONG&gt;debate, cross-examine, and challenge each other's findings&lt;/STRONG&gt; through a shared conversation, producing a diagnosis that's better than any single agent could achieve alone.&lt;/P&gt;
&lt;H2&gt;Table of Contents&lt;/H2&gt;
&lt;OL&gt;
&lt;LI&gt;Why Multi-Agent? The Problem with Single Agents&lt;/LI&gt;
&lt;LI&gt;Architecture Overview&lt;/LI&gt;
&lt;LI&gt;Understanding the Key SK Components&lt;/LI&gt;
&lt;LI&gt;The Actor Model — How InProcessRuntime Works&lt;/LI&gt;
&lt;LI&gt;Setting Up Your Development Environment&lt;/LI&gt;
&lt;LI&gt;Step-by-Step: Building the Multi-Agent Analyzer&lt;/LI&gt;
&lt;LI&gt;The Agent Interaction Flow — Round by Round&lt;/LI&gt;
&lt;LI&gt;Bugs I Found &amp;amp; Fixed — Lessons Learned&lt;/LI&gt;
&lt;LI&gt;Running with Different AI Providers&lt;/LI&gt;
&lt;LI&gt;What to Build Next&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;&lt;SPAN class="lia-text-color-21"&gt;1. Why Multi-Agent? The Problem with Single Agents&lt;/SPAN&gt;&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;A single AI agent analyzing a production issue is like having one doctor diagnose everything — they'll catch issues in their specialty but miss cross-domain connections.&lt;/P&gt;
&lt;P&gt;Consider this problem: &lt;STRONG&gt;"Users report 504 Gateway Timeout errors when uploading files larger than 10MB. Started after Friday's deployment. Worse during peak hours."&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;A single agent might say "it's a server timeout" and stop. But the real root cause often spans multiple layers:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The &lt;STRONG&gt;client&lt;/STRONG&gt; is sending chunked uploads with an incorrect Content-Length header (client-side bug)&lt;/LI&gt;
&lt;LI&gt;The &lt;STRONG&gt;load balancer&lt;/STRONG&gt; has a 30-second timeout that's too short for large uploads (network config)&lt;/LI&gt;
&lt;LI&gt;The &lt;STRONG&gt;server&lt;/STRONG&gt; recently deployed a new request body parser that's 3x slower (server-side regression)&lt;/LI&gt;
&lt;LI&gt;The combination only fails during peak hours because &lt;STRONG&gt;connection pool saturation&lt;/STRONG&gt; amplifies the latency&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;No single perspective catches this. You need specialists who analyze independently, then debate to find the cross-layer causal chain. That's what multi-agent orchestration gives you.&lt;/P&gt;
&lt;H3&gt;The 5 Orchestration Patterns in SK&lt;/H3&gt;
&lt;P&gt;Semantic Kernel provides 5 built-in patterns for agent collaboration:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SEQUENTIAL: &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;A → B → C → Done&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(pipeline — each builds on previous)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;CONCURRENT: &amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;↗ A ↘&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Task → B → Aggregate&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;↘ C ↗&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(parallel — results merged)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;GROUP CHAT: &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;A ↔ B ↔ C ↔ D &amp;nbsp; &amp;nbsp;← We use this one&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(rounds, shared history, debate)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;HANDOFF: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;A → (stuck?) → B → (complex?) → Human&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(escalation with human-in-the-loop)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MAGENTIC: &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;LLM picks who speaks next dynamically&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(AI-driven speaker selection)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We use &lt;STRONG&gt;GroupChatOrchestration&lt;/STRONG&gt; with &lt;STRONG&gt;RoundRobinGroupChatManager&lt;/STRONG&gt; because our problem requires agents to see each other's work, challenge assumptions, and build on each other's analysis across two rounds.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;&lt;SPAN class="lia-text-color-21"&gt;2. Architecture Overview&lt;/SPAN&gt;&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;Here's the complete architecture of what we're building:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;3. Understanding the Key SK Components&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;Before we write code, let's understand the 5 components we'll use and the design pattern each implements:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;ChatCompletionAgent — Strategy Pattern&lt;/H3&gt;
&lt;P&gt;The agent definition. Each agent is a combination of:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;name&lt;/STRONG&gt; — unique identifier (used in round-robin ordering)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;instructions&lt;/STRONG&gt; — the persona and rules (this is the prompt engineering)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;service&lt;/STRONG&gt; — which AI provider to call (Strategy Pattern — swap providers without changing agent logic)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;description&lt;/STRONG&gt; — what other agents/tools understand about this agent&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;agent = ChatCompletionAgent(
    name="ClientAnalyst",
    instructions="You are ONLY ClientAnalyst...",
    service=gemini_service,       # ← Strategy: swap to OpenAI with zero changes
    description="Analyzes client-side issues",
)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;GroupChatOrchestration — Mediator Pattern&lt;/H3&gt;
&lt;P&gt;The orchestration defines HOW agents interact. It's the Mediator — agents don't talk to each other directly. Instead, the orchestration manages a shared ChatHistory and routes messages through the Manager.&lt;/P&gt;
&lt;H3&gt;RoundRobinGroupChatManager — Strategy Pattern&lt;/H3&gt;
&lt;P&gt;The Manager decides WHO speaks next. RoundRobinGroupChatManager cycles through agents in a fixed order. SK also provides AutomaticGroupChatManager where the LLM decides who speaks next.&lt;/P&gt;
&lt;P&gt;max_rounds is the total number of messages per agent or cycle. With 4 agents and max_rounds=8, each agent speaks exactly twice.&lt;/P&gt;
&lt;H3&gt;InProcessRuntime — Actor Model Abstraction&lt;/H3&gt;
&lt;P&gt;The execution engine. Every agent becomes an "actor" with its own kind of mailbox (message queue). The runtime delivers messages between actors.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Key properties:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;No shared state&lt;/STRONG&gt; — agents communicate only through messages&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Sequential processing&lt;/STRONG&gt; — each agent processes one message at a time&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Location transparency&lt;/STRONG&gt; — same code works in-process today, distributed tomorrow&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;agent_response_callback — Observer Pattern&lt;/H3&gt;
&lt;P&gt;A function that fires after EVERY agent response. We use it to display each agent's output in real-time with emoji labels and round numbers.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;4. The Actor Model — How InProcessRuntime Works&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;The Actor Model is a concurrency pattern where each entity is an isolated "actor" with a private mailbox. Here's what happens inside InProcessRuntime when we run our demo:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;runtime.start()&lt;/P&gt;
&lt;P&gt;│&lt;/P&gt;
&lt;P&gt;├── Creates internal message loop (asyncio event loop)&lt;/P&gt;
&lt;P&gt;│&lt;/P&gt;
&lt;P&gt;orchestration.invoke(task="504 timeout...", runtime=runtime)&lt;/P&gt;
&lt;P&gt;│&lt;/P&gt;
&lt;P&gt;├── Creates Actor[Orchestrator] → manages overall flow&lt;/P&gt;
&lt;P&gt;├── Creates Actor[Manager] → RoundRobinGroupChatManager&lt;/P&gt;
&lt;P&gt;├── Creates Actor[ClientAnalyst] → mailbox created, waiting&lt;/P&gt;
&lt;P&gt;├── Creates Actor[NetworkAnalyst] → mailbox created, waiting&lt;/P&gt;
&lt;P&gt;├── Creates Actor[ServerAnalyst] → mailbox created, waiting&lt;/P&gt;
&lt;P&gt;└── Creates Actor[Coordinator] → mailbox created, waiting&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Manager receives "start" message&lt;/P&gt;
&lt;P&gt;│&lt;/P&gt;
&lt;P&gt;├── Checks turn order: [Client, Network, Server, Coordinator]&lt;/P&gt;
&lt;P&gt;├── Sends task to ClientAnalyst mailbox&lt;/P&gt;
&lt;P&gt;│ → ClientAnalyst processes: calls LLM → response&lt;/P&gt;
&lt;P&gt;│ → Response added to shared ChatHistory&lt;/P&gt;
&lt;P&gt;│ → callback fires (displayed in Notebook UI)&lt;/P&gt;
&lt;P&gt;│ → Sends "done" back to Manager&lt;/P&gt;
&lt;P&gt;│&lt;/P&gt;
&lt;P&gt;├── Manager updates: turn_index=1&lt;/P&gt;
&lt;P&gt;├── Sends to NetworkAnalyst mailbox&lt;/P&gt;
&lt;P&gt;│ → Same flow...&lt;/P&gt;
&lt;P&gt;│&lt;/P&gt;
&lt;P&gt;├── ... (ServerAnalyst, Coordinator for Round 1)&lt;/P&gt;
&lt;P&gt;│&lt;/P&gt;
&lt;P&gt;├── Manager checks: messages=4, max_rounds=8 → continue&lt;/P&gt;
&lt;P&gt;│&lt;/P&gt;
&lt;P&gt;├── Round 2: same cycle with cross-examination&lt;/P&gt;
&lt;P&gt;│&lt;/P&gt;
&lt;P&gt;└── After message 8: Manager sends "complete"&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;→ OrchestrationResult resolves&lt;/P&gt;
&lt;P&gt;→ result.get() returns final answer&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;runtime.stop_when_idle()&lt;/P&gt;
&lt;P&gt;→ All mailboxes empty → clean shutdown&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The Actor Model guarantees:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No race conditions (each actor processes one message at a time)&lt;/LI&gt;
&lt;LI&gt;No deadlocks (no shared locks to contend for)&lt;/LI&gt;
&lt;LI&gt;No shared mutable state (agents communicate only via messages)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;5. Setting Up Your Development Environment&lt;/U&gt;&lt;/H2&gt;
&lt;H3&gt;Prerequisites&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Python 3.11 or 3.12&lt;/STRONG&gt; (3.13+ may have compatibility issues with some SK connectors)&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Visual Studio Code&lt;/STRONG&gt; with the Python and Jupyter extensions&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;An API key&lt;/STRONG&gt; from one of: Google AI Studio (free), OpenAI&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;Step 1: Install Python&lt;/H3&gt;
&lt;P&gt;Download from python.org. During installation, check "Add Python to PATH".&lt;/P&gt;
&lt;P&gt;Verify:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;python --version
# Python 3.12.x&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Step 2: Install VS Code Extensions&lt;/H3&gt;
&lt;P&gt;Open VS Code, go to Extensions (Ctrl+Shift+X), and install:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Python&lt;/STRONG&gt; (by Microsoft) — Python language support&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Jupyter&lt;/STRONG&gt; (by Microsoft) — Notebook support&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Pylance&lt;/STRONG&gt; (by Microsoft) — IntelliSense and type checking&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Step 3: Create Project Folder&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;mkdir sk-multiagent-demo
cd sk-multiagent-demo&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Open in VS Code:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;code .&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Step 4: Create Virtual Environment&lt;/H3&gt;
&lt;P&gt;Open the VS Code terminal (Ctrl+`) and run:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# Create virtual environment
python -m venv sk-env

# Activate it
# Windows:
sk-env\Scripts\activate
# macOS/Linux:
source sk-env/bin/activate&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You should see (sk-env) in your terminal prompt.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Step 5: Install Semantic Kernel&lt;/H3&gt;
&lt;P&gt;For &lt;STRONG&gt;Google Gemini&lt;/STRONG&gt; (free tier — recommended for getting started):&lt;/P&gt;
&lt;LI-CODE lang=""&gt;pip install semantic-kernel[google] python-dotenv ipykernel&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For &lt;STRONG&gt;OpenAI&lt;/STRONG&gt; (paid API key):&lt;/P&gt;
&lt;LI-CODE lang=""&gt;pip install semantic-kernel openai python-dotenv ipykernel&lt;/LI-CODE&gt;
&lt;P&gt;For &lt;STRONG&gt;Azure AI Foundry&lt;/STRONG&gt; (enterprise, Entra ID auth):&lt;/P&gt;
&lt;LI-CODE lang=""&gt;pip install semantic-kernel azure-identity python-dotenv ipykernel&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Step 6: Register the Jupyter Kernel&lt;/H3&gt;
&lt;LI-CODE lang=""&gt;python -m ipykernel install --user --name=sk-env --display-name="Semantic Kernel (Python 3.12)"&lt;/LI-CODE&gt;
&lt;P&gt;You can also select if this is already available from your environment from VSCode as below:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Step 7: Get Your API Key&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Option A — Google Gemini (FREE, recommended for demo):&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Go to &lt;A href="https://aistudio.google.com/apikey" target="_blank" rel="noopener"&gt;https://aistudio.google.com/apikey&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Click "Create API Key"&lt;/LI&gt;
&lt;LI&gt;Copy the key&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Free tier limits: 15 requests/minute, 1 million tokens/minute — more than enough for this demo.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Option B — OpenAI:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Go to &lt;A href="https://platform.openai.com/api-keys" target="_blank" rel="noopener"&gt;https://platform.openai.com/api-keys&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Create a new key&lt;/LI&gt;
&lt;LI&gt;Copy the key&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Option C — Azure AI Foundry:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Deploy a model in Azure AI Foundry portal&lt;/LI&gt;
&lt;LI&gt;Note the endpoint URL and deployment name&lt;/LI&gt;
&lt;LI&gt;If key-based auth is disabled, you'll need Entra ID with permissions&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;Step 8: Create the .env File&lt;/H3&gt;
&lt;P&gt;In your project root, create a file named .env:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For Gemini:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;GOOGLE_AI_API_KEY=AIzaSy...your-key-here
GOOGLE_AI_GEMINI_MODEL_ID=gemini-2.5-flash&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For OpenAI:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;OPENAI_API_KEY=sk-...your-key-here
OPENAI_CHAT_MODEL_ID=gpt-4o&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For Azure AI Foundry:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;AZURE_OPENAI_ENDPOINT=https://your-resource.cognitiveservices.azure.com
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=gpt-4o
AZURE_OPENAI_API_KEY=your-key&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Step 9: Create the Notebook&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In VS Code:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Click &lt;STRONG&gt;File &amp;gt; New File&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Save as multi_agent_analyzer.ipynb&lt;/LI&gt;
&lt;LI&gt;In the top-right of the notebook, click &lt;STRONG&gt;Select Kernel&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Choose &lt;STRONG&gt;Semantic Kernel (Python 3.12)&lt;/STRONG&gt; (or your sk-env)&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Your environment is ready. Let's build.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;6. Step-by-Step: Building the Multi-Agent Analyzer&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;Cell 1: Verify Setup&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import semantic_kernel
print(f"Semantic Kernel version: {semantic_kernel.__version__}")

from semantic_kernel.agents import (
    ChatCompletionAgent,
    GroupChatOrchestration,
    RoundRobinGroupChatManager,
)
from semantic_kernel.agents.runtime import InProcessRuntime
from semantic_kernel.contents import ChatMessageContent
print("All imports successful")&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cell 2: Load API Key and Create Service&lt;/P&gt;
&lt;P&gt;For Gemini:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import os
from dotenv import load_dotenv
load_dotenv()

from semantic_kernel.connectors.ai.google.google_ai import (
    GoogleAIChatCompletion,
    GoogleAIChatPromptExecutionSettings,
)
from semantic_kernel.contents import ChatHistory

GEMINI_API_KEY = os.getenv("GOOGLE_AI_API_KEY")
GEMINI_MODEL = os.getenv("GOOGLE_AI_GEMINI_MODEL_ID", "gemini-2.5-flash")

service = GoogleAIChatCompletion(
    gemini_model_id=GEMINI_MODEL,
    api_key=GEMINI_API_KEY,
)
print(f"Service created: Gemini {GEMINI_MODEL}")

# Smoke test
settings = GoogleAIChatPromptExecutionSettings()
test_history = ChatHistory(system_message="You are a helpful assistant.")
test_history.add_user_message("Say 'Connected!' and nothing else.")
response = await service.get_chat_message_content(
    chat_history=test_history, settings=settings
)
print(f"Model says: {response.content}")&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For OpenAI:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import os
from dotenv import load_dotenv
load_dotenv()

from semantic_kernel.connectors.ai.open_ai import (
    OpenAIChatCompletion,
    OpenAIChatPromptExecutionSettings,
)
from semantic_kernel.contents import ChatHistory

service = OpenAIChatCompletion(
    ai_model_id=os.getenv("OPENAI_CHAT_MODEL_ID", "gpt-4o"),
)
print(f"Service created: OpenAI {os.getenv('OPENAI_CHAT_MODEL_ID', 'gpt-4o')}")

# Smoke test
settings = OpenAIChatPromptExecutionSettings()
test_history = ChatHistory(system_message="You are a helpful assistant.")
test_history.add_user_message("Say 'Connected!' and nothing else.")
response = await service.get_chat_message_content(
    chat_history=test_history, settings=settings
)
print(f"Model says: {response.content}")&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cell 3: Define All 4 Agents&lt;/P&gt;
&lt;P&gt;This is the most important cell — the prompt engineering that makes the demo work:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from semantic_kernel.agents import ChatCompletionAgent

# ═══════════════════════════════════════════════════
# AGENT 1: Client-Side Analyst
# ═══════════════════════════════════════════════════
client_agent = ChatCompletionAgent(
    name="ClientAnalyst",
    description="Analyzes problems from the client-side: browser, JS, CORS, caching, UI symptoms",
    instructions="""You are ONLY **ClientAnalyst**. You must NEVER speak as NetworkAnalyst,
ServerAnalyst, or Coordinator. Every word you write is from ClientAnalyst's perspective only.

You are a senior front-end and client-side diagnostics expert.

When given a problem statement, analyze it EXCLUSIVELY from the client side:

1. **Browser &amp;amp; Rendering**: DOM issues, JavaScript errors, CSS rendering, browser compatibility,
   memory leaks, console errors.
2. **Client-Side Caching**: Stale cache, service worker issues, local storage corruption.
3. **Network from Client View**: CORS errors, preflight failures, request timeouts,
   client-side retry storms, fetch/XHR configuration.
4. **Upload Handling**: File API usage, chunk upload implementation, progress tracking,
   FormData construction, content-type headers.
5. **UI/UX Symptoms**: What the user sees, error messages displayed, loading states.

ROUND 1: Provide your independent analysis. Do NOT reference other agents.
  List your top 3 most likely causes with evidence.
  Every response MUST be at least 200 words.
ROUND 2: You MUST:
  - Reference NetworkAnalyst and ServerAnalyst BY NAME
  - State specifically where you AGREE or DISAGREE with their findings
  - Answer the Coordinator's questions from your perspective
  - Add NEW cross-layer insights you see from the client perspective
  - Do NOT just say 'I agree' — provide substantive technical reasoning

Be specific, evidence-based, and prioritize findings by likelihood.""",
    service=service,
)

# ═══════════════════════════════════════════════════
# AGENT 2: Network Analyst
# ═══════════════════════════════════════════════════
network_agent = ChatCompletionAgent(
    name="NetworkAnalyst",
    description="Analyzes problems from the network side: DNS, TCP, TLS, firewalls, load balancers, latency",
    instructions="""You are ONLY **NetworkAnalyst**. You must NEVER speak as ClientAnalyst,
ServerAnalyst, or Coordinator. Every word you write is from NetworkAnalyst's perspective only.

You are a senior network infrastructure diagnostics expert.

When given a problem statement, analyze it EXCLUSIVELY from the network layer:

1. **DNS &amp;amp; Resolution**: DNS TTL, propagation delays, record misconfigurations.
2. **TCP/IP &amp;amp; Connections**: Connection pooling, keep-alive, TCP window scaling,
   connection resets, SYN floods.
3. **TLS/SSL**: Certificate issues, handshake failures, protocol version mismatches.
4. **Load Balancers &amp;amp; Proxies**: Sticky sessions, health checks, timeout configs,
   request body size limits, proxy buffering.
5. **Firewall &amp;amp; WAF**: Rule blocks, rate limiting, request inspection delays,
   geo-blocking, DDoS protection interference.

ROUND 1: Provide your independent analysis. Do NOT reference other agents.
  List your top 3 most likely causes with evidence.
  Every response MUST be at least 200 words.
ROUND 2: You MUST:
  - Reference ClientAnalyst and ServerAnalyst BY NAME
  - State specifically where you AGREE or DISAGREE with their findings
  - Answer the Coordinator's questions from your perspective
  - Add NEW cross-layer insights you see from the network perspective
  - Do NOT just say 'I am ready to proceed' — provide substantive technical analysis

Be specific, evidence-based, and prioritize findings by likelihood.""",
    service=service,
)

# ═══════════════════════════════════════════════════
# AGENT 3: Server-Side Analyst
# ═══════════════════════════════════════════════════
server_agent = ChatCompletionAgent(
    name="ServerAnalyst",
    description="Analyzes problems from the server side: backend app, database, logs, resources, deployments",
    instructions="""You are ONLY **ServerAnalyst**. You must NEVER speak as ClientAnalyst,
NetworkAnalyst, or Coordinator. Every word you write is from ServerAnalyst's perspective only.

You are a senior backend and infrastructure diagnostics expert.

When given a problem statement, analyze it EXCLUSIVELY from the server side:

1. **Application Server**: Error logs, exception traces, thread pool exhaustion,
   memory leaks, CPU spikes, garbage collection pauses.
2. **Database**: Slow queries, connection pool saturation, lock contention,
   deadlocks, replication lag, query plan changes.
3. **Deployment &amp;amp; Config**: Recent deployments, configuration changes, feature flags,
   environment variable mismatches, rollback candidates.
4. **Resource Limits**: File upload size limits, request body limits, disk space,
   temporary file cleanup, storage quotas.
5. **External Dependencies**: Upstream API timeouts, third-party service degradation,
   queue backlogs, cache (Redis/Memcached) issues.

ROUND 1: Provide your independent analysis. Do NOT reference other agents.
  List your top 3 most likely causes with evidence.
  Every response MUST be at least 200 words.
ROUND 2: You MUST:
  - Reference ClientAnalyst and NetworkAnalyst BY NAME
  - State specifically where you AGREE or DISAGREE with their findings
  - Answer the Coordinator's questions from your perspective
  - Add NEW cross-layer insights you see from the server perspective
  - Do NOT just say 'I agree' — provide substantive technical reasoning

Be specific, evidence-based, and prioritize findings by likelihood.""",
    service=service,
)

# ═══════════════════════════════════════════════════
# AGENT 4: Coordinator
# ═══════════════════════════════════════════════════
coordinator_agent = ChatCompletionAgent(
    name="Coordinator",
    description="Synthesizes all specialist analyses into a final root cause report with prioritized action plan",
    instructions="""You are ONLY **Coordinator**. You must NEVER speak as ClientAnalyst,
NetworkAnalyst, or ServerAnalyst. You synthesize — you do NOT do domain-specific analysis.

You are the lead engineer who synthesizes the team's findings.

═══ ROUND 1 BEHAVIOR (your first turn, message 4) ═══
Keep this SHORT — maximum 300 words.
  - Note 2-3 KEY PATTERNS across the three analyses
  - Identify where specialists AGREE (high-confidence)
  - Identify where they CONTRADICT (needs resolution)
  - Ask 2-3 SPECIFIC QUESTIONS for Round 2

  Round 1 MUST NOT: assign tasks, create action plans, write reports,
   or tell agents what to take lead on. Observation + questions ONLY.

═══ ROUND 2 BEHAVIOR (your final turn, message 8) ═══
Keep this FOCUSED — maximum 800 words. Produce a structured report:

1. **Root Cause** (1 paragraph): The #1 most likely cause with causal chain
   across layers. Reference specific findings from each specialist.

2. **Confidence** (short list):
   - HIGH: Areas where all 3 agreed
   - MEDIUM: Areas where 2 of 3 agreed
   - LOW: Disagreements needing investigation

3. **Action Plan** (numbered, max 6 items): For each:
   - What to do (specific)
   - Owner (Client/Network/Server team)
   - Time estimate

4. **Quick Wins vs Long-term** (2 short lists)

Do NOT repeat what specialists already said verbatim. Synthesize, don't echo.""",
    service=service,
)

# ═══════════════════════════════════════════════════
# All 4 agents — order = RoundRobin order
# ═══════════════════════════════════════════════════
agents = [client_agent, network_agent, server_agent, coordinator_agent]

print(f"{len(agents)} agents created:")
for i, a in enumerate(agents, 1):
    print(f"  {i}. {a.name}: {a.description[:60]}...")
print(f"\nRoundRobin order: {' → '.join(a.name for a in agents)}")&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cell 4: Run the Analysis&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from semantic_kernel.agents import GroupChatOrchestration, RoundRobinGroupChatManager
from semantic_kernel.agents.runtime import InProcessRuntime
from semantic_kernel.contents import ChatMessageContent
from IPython.display import display, Markdown

# ╔══════════════════════════════════════════════════════════╗
# ║   EDIT YOUR PROBLEM STATEMENT HERE                       ║
# ╚══════════════════════════════════════════════════════════╝

PROBLEM = """
Users are reporting intermittent 504 Gateway Timeout errors when trying
to upload files larger than 10MB through our web application. The issue
started after last Friday's deployment and seems worse during peak hours
(2-5 PM EST). Some users also report that smaller file uploads work fine
but the progress bar freezes at 85% for large files before timing out.
"""

# ════════════════════════════════════════════════════════════

agent_responses = []

def agent_response_callback(message: ChatMessageContent) -&amp;gt; None:
    name = message.name or "Unknown"
    content = message.content or ""
    agent_responses.append({"agent": name, "content": content})
    emoji = {
        "ClientAnalyst": "🖥️", "NetworkAnalyst": "🌐",
        "ServerAnalyst": "⚙️", "Coordinator": "🎯"
    }.get(name, "🔹")
    round_num = (len(agent_responses) - 1) // len(agents) + 1
    display(Markdown(
        f"---\n### {emoji} {name} (Message {len(agent_responses)}, Round {round_num})\n\n{content}"
    ))

MAX_ROUNDS = 8  # 4 agents × 2 rounds = 8 messages exactly

task = f"""## Problem Statement

{PROBLEM.strip()}

## Discussion Rules

You are in a GROUP DISCUSSION with 4 members. You can see ALL previous messages.
There are exactly 2 rounds.

### ROUND 1 (Messages 1-4): Independent Analysis
- ClientAnalyst, NetworkAnalyst, ServerAnalyst: Analyze from YOUR domain only.
  Give your top 3 most likely causes with evidence and reasoning.
- Coordinator: Note patterns across the 3 analyses. Ask 2-3 specific questions.
  Do NOT assign tasks yet.

### ROUND 2 (Messages 5-8): Cross-Examination &amp;amp; Final Report
- ClientAnalyst, NetworkAnalyst, ServerAnalyst: You MUST reference the OTHER
  specialists BY NAME. State where you agree, disagree, or have new insights.
  Answer the Coordinator's questions. Provide SUBSTANTIVE analysis.
- Coordinator: Produce the FINAL structured report: root cause, confidence levels,
  prioritized action plan with owners and time estimates.

IMPORTANT: Each agent speaks as THEMSELVES only. Never impersonate another agent."""

display(Markdown(f"## Problem Statement\n\n{PROBLEM.strip()}"))
display(Markdown(f"---\n## Discussion Starting — {len(agents)} agents, {MAX_ROUNDS} rounds\n"))

# Build and run
orchestration = GroupChatOrchestration(
    members=agents,
    manager=RoundRobinGroupChatManager(max_rounds=MAX_ROUNDS),
    agent_response_callback=agent_response_callback,
)

runtime = InProcessRuntime()
runtime.start()

result = await orchestration.invoke(task=task, runtime=runtime)
final_result = await result.get(timeout=300)
await runtime.stop_when_idle()

display(Markdown(f"---\n## FINAL CONCLUSION\n\n{final_result}"))&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cell 5: Statistics and Validation&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;print("═" * 55)
print("  ANALYSIS STATISTICS")
print("═" * 55)

emojis = {"ClientAnalyst": "🖥️", "NetworkAnalyst": "🌐",
          "ServerAnalyst": "⚙️", "Coordinator": "🎯"}

agent_counts = {}
agent_chars = {}
for r in agent_responses:
    agent_counts[r["agent"]] = agent_counts.get(r["agent"], 0) + 1
    agent_chars[r["agent"]] = agent_chars.get(r["agent"], 0) + len(r["content"])

for agent, count in agent_counts.items():
    em = emojis.get(agent, "🔹")
    chars = agent_chars.get(agent, 0)
    avg = chars // count if count else 0
    print(f"  {em} {agent}: {count} msg(s), ~{chars:,} chars (avg {avg:,}/msg)")

print(f"\n  Total messages: {len(agent_responses)}")
total_chars = sum(len(r['content']) for r in agent_responses)
print(f"  Total analysis: ~{total_chars:,} characters")

# Validation
print(f"\n  Validation:")

import re
identity_issues = []
for r in agent_responses:
    other_agents = [a.name for a in agents if a.name != r["agent"]]
    for other in other_agents:
        pattern = rf'(?i)as {re.escape(other)}[,:]?\s+I\b'
        if re.search(pattern, r["content"][:300]):
            identity_issues.append(f"{r['agent']} impersonated {other}")

if identity_issues:
    print(f"  Identity confusion: {identity_issues}")
else:
    print(f"  No identity confusion detected")

thin = [r for r in agent_responses if len(r["content"].strip()) &amp;lt; 100]
if thin:
    for t in thin:
        print(f"   Thin response from {t['agent']}")
else:
    print(f"  All responses are substantive")&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cell 6: Save Report&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"analysis_report_{timestamp}.md"

with open(filename, "w", encoding="utf-8") as f:
    f.write(f"# Problem Analysis Report\n\n")
    f.write(f"**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
    f.write(f"**Agents:** {', '.join(a.name for a in agents)}\n")
    f.write(f"**Rounds:** {MAX_ROUNDS}\n\n---\n\n")
    f.write(f"## Problem Statement\n\n{PROBLEM.strip()}\n\n---\n\n")
    for i, r in enumerate(agent_responses, 1):
        em = emojis.get(r['agent'], '🔹')
        round_num = (i - 1) // len(agents) + 1
        f.write(f"### {em} {r['agent']} (Message {i}, Round {round_num})\n\n")
        f.write(f"{r['content']}\n\n---\n\n")
    f.write(f"## Final Conclusion\n\n{final_result}\n")

print(f"Report saved to: {filename}")&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;7. The Agent Interaction Flow — Round by Round&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's what actually happens during the 8-message orchestration:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Round 1: Independent Analysis (Messages 1-4)&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; height: 291.649px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr style="height: 79.1319px;"&gt;&lt;th style="height: 79.1319px;"&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Msg&lt;/P&gt;
&lt;/th&gt;&lt;th style="height: 79.1319px;"&gt;Agent&lt;/th&gt;&lt;th style="height: 79.1319px;"&gt;What They See&lt;/th&gt;&lt;th style="height: 79.1319px;"&gt;What They Do&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style="height: 35.1215px;"&gt;&lt;td style="height: 35.1215px;"&gt;1&lt;/td&gt;&lt;td style="height: 35.1215px;"&gt;ClientAnalyst&lt;/td&gt;&lt;td style="height: 35.1215px;"&gt;Problem statement only&lt;/td&gt;&lt;td style="height: 35.1215px;"&gt;Analyzes from client perspective: upload chunking, progress bar freezing at 85%, CORS, content-type headers&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 59.1319px;"&gt;&lt;td style="height: 59.1319px;"&gt;2&lt;/td&gt;&lt;td style="height: 59.1319px;"&gt;NetworkAnalyst&lt;/td&gt;&lt;td style="height: 59.1319px;"&gt;Problem + ClientAnalyst's analysis&lt;/td&gt;&lt;td style="height: 59.1319px;"&gt;Gives INDEPENDENT analysis despite seeing msg 1: load balancer timeouts, proxy body size limits, TCP window scaling&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 59.1319px;"&gt;&lt;td style="height: 59.1319px;"&gt;3&lt;/td&gt;&lt;td style="height: 59.1319px;"&gt;ServerAnalyst&lt;/td&gt;&lt;td style="height: 59.1319px;"&gt;Problem + msgs 1-2&lt;/td&gt;&lt;td style="height: 59.1319px;"&gt;Gives INDEPENDENT analysis: recent deployment regression, request body parser, thread pool exhaustion, disk space&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 59.1319px;"&gt;&lt;td style="height: 59.1319px;"&gt;4&lt;/td&gt;&lt;td style="height: 59.1319px;"&gt;Coordinator&lt;/td&gt;&lt;td style="height: 59.1319px;"&gt;Problem + msgs 1-3&lt;/td&gt;&lt;td style="height: 59.1319px;"&gt;Observes patterns: "All three mention timeout configuration. ClientAnalyst and NetworkAnalyst both point to body size. Question: Was the deployment a backend-only change or did it include infra?"&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Round 2: Cross-Examination (Messages 5-8)&lt;/H3&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Msg&lt;/th&gt;&lt;th&gt;Agent&lt;/th&gt;&lt;th&gt;What They Do&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;ClientAnalyst&lt;/td&gt;&lt;td&gt;"I agree with NetworkAnalyst that the load balancer timeout is likely a factor — the 85% freeze point matches the 30-second LB timeout for a 10MB upload on our average upload speed. However, I disagree with ServerAnalyst about thread pool exhaustion because the UI shows a clean 504, not a connection reset."&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;NetworkAnalyst&lt;/td&gt;&lt;td&gt;"ServerAnalyst's point about the recent deployment is critical — if the new request parser is 3x slower, that would push uploads past the LB timeout. I can confirm the LB has a 30s idle timeout. The fix is both: increase LB timeout AND optimize the parser."&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7&lt;/td&gt;&lt;td&gt;ServerAnalyst&lt;/td&gt;&lt;td&gt;"Responding to Coordinator's question: The deployment was backend-only — a new multipart parser using streaming instead of buffered reads. ClientAnalyst is correct that the 504 is from the LB, not the app. The app itself returns 200 after 45 seconds, but the LB kills the connection at 30."&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8&lt;/td&gt;&lt;td&gt;Coordinator&lt;/td&gt;&lt;td&gt;Produces final structured report with root cause: "The backend deployment introduced a slower multipart parser (45s vs 15s for 10MB). The load balancer's 30s timeout kills the connection at ~85% progress. Fix: immediate — increase LB timeout to 120s. Short-term — optimize parser. Long-term — implement chunked uploads with progress resumption."&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Notice:&lt;/STRONG&gt; The Round 2 analysis is dramatically better than Round 1. Agents reference each other by name, build on each other's findings, and the Coordinator can synthesize a cross-layer causal chain that no single agent could have produced.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I made a small adjustment to the issue with Azure Web Apps. Please find the details below from testing carried out using Google Gemini:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;&lt;img /&gt;&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;8. Bugs I Found &amp;amp; Fixed — Lessons Learned&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;Building this demo taught me several important lessons about multi-agent systems:&lt;/P&gt;
&lt;H3&gt;Bug 1: Agents Speaking Only Once&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Symptom:&lt;/STRONG&gt; Only 4 messages instead of 8.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Root cause:&lt;/STRONG&gt; The agents list was missing the Coordinator. It was defined in a separate cell and wasn't included in the members list.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt; All 4 agents must be in the same list passed to GroupChatOrchestration.&lt;/P&gt;
&lt;H3&gt;Bug 2: NetworkAnalyst Says "I'm Ready to Proceed"&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Symptom:&lt;/STRONG&gt; NetworkAnalyst's Round 2 response was just "I'm ready to proceed with the analysis" — no actual content.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Root cause:&lt;/STRONG&gt; The Coordinator's Round 1 message was assigning tasks ("NetworkAnalyst, please check the load balancer config"), and the agent was acknowledging the assignment instead of analyzing.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt; Added explicit constraint to Coordinator: "Round 1 MUST NOT assign tasks — observation + questions ONLY."&lt;/P&gt;
&lt;H3&gt;Bug 3: ServerAnalyst Says "As NetworkAnalyst, I..."&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Symptom:&lt;/STRONG&gt; ServerAnalyst's response started with "As NetworkAnalyst, I believe..."&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Root cause:&lt;/STRONG&gt; LLM identity bleeding. When agents share ChatHistory, the LLM sometimes loses track of which agent it's currently playing. This is especially common with Gemini.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt; Identity anchoring at the very top of every agent's instructions: "You are ONLY &lt;STRONG&gt;ServerAnalyst&lt;/STRONG&gt;. You must NEVER speak as ClientAnalyst, NetworkAnalyst, or Coordinator."&lt;/P&gt;
&lt;H3&gt;Bug 4: Gemini Gives Thin/Empty Responses&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Symptom:&lt;/STRONG&gt; Some agents responded with just one sentence or "I concur."&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Root cause:&lt;/STRONG&gt; Gemini 2.5 Flash is more concise than GPT-4o by default. Without explicit length requirements, it takes shortcuts.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt; Added "Every response MUST be at least 200 words" and "Answer the Coordinator's questions" to every specialist's instructions.&lt;/P&gt;
&lt;H3&gt;Bug 5: Coordinator's Report is 18K Characters&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Symptom:&lt;/STRONG&gt; The Coordinator's Round 2 response was absurdly long — repeating everything every specialist said.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt; Added word limits: "Round 1 max 300 words, Round 2 max 800 words" and "Synthesize, don't echo."&lt;/P&gt;
&lt;H3&gt;Bug 6: MAX_ROUNDS Math&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Symptom:&lt;/STRONG&gt; With MAX_ROUNDS=9, ClientAnalyst spoke a 3rd time after the Coordinator's final report — breaking the clean 2-round structure.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt; MAX_ROUNDS must equal (number of agents × number of rounds). For 4 agents × 2 rounds = 8.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;9. Running with Different AI Providers&lt;/U&gt;&lt;/H2&gt;
&lt;P&gt;The beauty of SK's Strategy Pattern is that you change ONE LINE to switch providers. Everything else — agents, orchestration, callbacks, validation — stays identical.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Gemini setup:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from semantic_kernel.connectors.ai.google.google_ai import GoogleAIChatCompletion

service = GoogleAIChatCompletion(
    gemini_model_id="gemini-2.5-flash",
    api_key=os.getenv("GOOGLE_AI_API_KEY"),
)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;OpenAI Setup&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion

service = OpenAIChatCompletion(
    ai_model_id="gpt-4o",
    api_key=os.getenv("OPEN_AI_API_KEY"),
)
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;U&gt;10. What to Build Next&lt;/U&gt;&lt;/H2&gt;
&lt;H3&gt;Add Plugins to Agents&lt;/H3&gt;
&lt;P&gt;Give agents real tools — not just LLM reasoning - looks exciting right ;)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;class NetworkDiagnosticPlugin:
    (description="Pings a host and returns latency")
    def ping(self, host: str) -&amp;gt; str:
        result = subprocess.run(["ping", "-c", "3", host], capture_output=True, text=True)
        return result.stdout

class LogSearchPlugin:
    (description="Searches server logs for error patterns")
    def search_logs(self, pattern: str, hours: int = 1) -&amp;gt; str:
        # Query your log aggregator (Splunk, ELK, Azure Monitor)
        return query_logs(pattern, hours)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Add Filters for Governance&lt;/H3&gt;
&lt;P&gt;Intercept every agent call for PII redaction and audit logging:&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;.filter(filter_type=FilterTypes.FUNCTION_INVOCATION)
async def audit_filter(context, next):
    print(f"[AUDIT] {context.function.name} called by agent")
    await next(context)
    print(f"[AUDIT] {context.function.name} returned")&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Try Different Orchestration Patterns&lt;/H3&gt;
&lt;P&gt;Replace GroupChat with Sequential for a pipeline approach:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# Instead of debate, each agent builds on the previous
orchestration = SequentialOrchestration(
    members=[client_agent, network_agent, server_agent, coordinator_agent]
)&lt;/LI-CODE&gt;
&lt;P&gt;Or Concurrent for parallel analysis:&lt;/P&gt;
&lt;LI-CODE lang=""&gt;# All specialists analyze simultaneously, Coordinator aggregates
orchestration = ConcurrentOrchestration(
    members=[client_agent, network_agent, server_agent]
)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Deploy to Azure&lt;/H3&gt;
&lt;P&gt;Move from InProcessRuntime to Azure Container Apps for production scaling. The agent code doesn't change — only the runtime.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;&lt;EM&gt;Summary&lt;/EM&gt;&lt;/H2&gt;
&lt;P class="lia-align-justify"&gt;The key insight from building this demo: &lt;STRONG&gt;multi-agent systems produce better results than single agents&lt;/STRONG&gt; not because each agent is smarter, but because the debate structure forces cross-domain thinking that a single prompt can never achieve. The Coordinator's final report consistently identifies causal chains that span client, network, and server layers — exactly the kind of insight that production incident response teams need.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Semantic Kernel makes this possible&lt;/STRONG&gt; with clean separation of concerns: agents define WHAT to analyze, orchestration defines HOW they interact, the manager defines WHO speaks when, the runtime handles WHERE it executes, and callbacks let you OBSERVE everything. Each piece is independently swappable — that's the power of SK from Microsoft.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Resources:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;GitHub: github.com/microsoft/semantic-kernel&lt;/LI&gt;
&lt;LI&gt;Docs: learn.microsoft.com/semantic-kernel&lt;/LI&gt;
&lt;LI&gt;Orchestration Patterns: learn.microsoft.com/semantic-kernel/frameworks/agent/agent-orchestration&lt;/LI&gt;
&lt;LI&gt;Discord: aka.ms/sk/discord&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;Disclaimer:&lt;/H4&gt;
&lt;P&gt;The sample scripts provided in this article are provided AS IS without warranty of any kind. The author is not responsible for any issues, damages, or problems that may arise from using these scripts. Users should thoroughly test any implementation in their environment before deploying to production. Azure services and APIs may change over time, which could affect the functionality of the provided scripts. Always refer to the latest Azure documentation for the most up-to-date information.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for reading this blog! I hope you found it helpful and informative for building AI agents with SK (Semantic Kernel) 😀&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Apr 2026 11:04:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/building-multi-agent-orchestration-using-microsoft-semantic/m-p/4507660#M22475</guid>
      <dc:creator>ani_ms_emea</dc:creator>
      <dc:date>2026-04-01T11:04:00Z</dc:date>
    </item>
    <item>
      <title>Recovering our Default Azure Directory</title>
      <link>https://techcommunity.microsoft.com/t5/azure/recovering-our-default-azure-directory/m-p/4507364#M22474</link>
      <description>&lt;P&gt;Hello, everyone, relative newcomer to Azure here.&amp;nbsp; I'm dealing with an inherited situation and, to add to the fun, I've just discovered my organization only has a Basic support plan, so no access to Azure technical support.&amp;nbsp; I'm hoping some knowledgeable souls on here are in a charitable mood and will point me in the right direction.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We're having problems getting to our DNS subscription because it's locked away behind an Azure directory to which we don't seem to have access, and I'm not quite sure this is completely an Azure problem.&amp;nbsp; I was able to get into this directory around a year ago but I so seldomly access it that I'm not sure when this changed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have two Azure directories.&amp;nbsp; One is our "regular" directory, named for our organization, and it's linked (not sure of the terminology here) to our domain.&amp;nbsp; Let's call it This.Domain.com.&amp;nbsp; There are no subscriptions in this directory.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The other is named "Default Directory" and it's linked to an onmicrosoft domain -- let's call it OldAdminThisDomain.onmicrosoft.com.&amp;nbsp; When I try to switch to this directory I'm prompted to log in, then I'm hit with the MFA prompt.&amp;nbsp; This is normally not a problem but it's like the MFA was set up for a different account with the same email address.&amp;nbsp; By contrast, I can log into both the regular Azure directory and the 365 admin page with no problem -- I type in my email address (let's call it email address removed for privacy reasons), MFA comes up, and I have several authentication methods to choose from:&amp;nbsp; Microsoft's MFA app, SMS, email, YubiKey, phone, etc., and all these options work.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When trying to log into the Azure Default Directory, however, the MFA acknowledges only either the Microsoft Authenticator app or Use a Verification Code (which also goes through the Microsoft Authenticator app), and neither option yields any prompt on my phone.&amp;nbsp; I seem to recall I effectively had two different "accounts" that somehow used the same email address but had different MFA setups, but again this was around a year and 3 phones ago so I don't have a solid memory of what was happening.&amp;nbsp; I am also aware that, while this should not be permittable, there have been several cases where multiple Microsoft accounts were somehow created using the same email address.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So this is where I am.&amp;nbsp; Ideally we could merge the two Azure directories so that we combine the accessibility of the "regular" directory with the subscription(s?) that are in the Default Directory.&amp;nbsp; Barring that I would have to somehow get the (suspected) two Microsoft accounts based on the email address removed for privacy reasons email address corrected.&amp;nbsp; Any help would be greatly appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks to all in advance&lt;/P&gt;</description>
      <pubDate>Tue, 31 Mar 2026 15:55:10 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/recovering-our-default-azure-directory/m-p/4507364#M22474</guid>
      <dc:creator>GMDAL</dc:creator>
      <dc:date>2026-03-31T15:55:10Z</dc:date>
    </item>
    <item>
      <title>Docker Engine v29 on Linux: Why data-root No Longer Prevents OS Disk Growth (and How to Fix It)</title>
      <link>https://techcommunity.microsoft.com/t5/azure/docker-engine-v29-on-linux-why-data-root-no-longer-prevents-os/m-p/4504862#M22466</link>
      <description>&lt;H3&gt;Scope&lt;/H3&gt;
&lt;P&gt;Applies to Linux hosts only&lt;/P&gt;
&lt;P&gt;Does not apply to Windows or Docker Desktop&lt;/P&gt;
&lt;H2&gt;Problem Summary&lt;/H2&gt;
&lt;P&gt;After upgrading to &lt;STRONG&gt;Docker Engine v29&lt;/STRONG&gt; or reimaging Linux nodes with this version, you may observe &lt;STRONG&gt;unexpected growth on the OS disk&lt;/STRONG&gt;, even when Docker is configured with a custom data-root pointing to a mounted data disk.&lt;/P&gt;
&lt;P&gt;This commonly affects cloud environments (VMSS, Azure Batch, self‑managed Linux VMs) where the OS disk is intentionally kept small and container data is expected to reside on a separate data disk.&lt;/P&gt;
&lt;H2&gt;What Changed in Docker Engine v29 (Linux)&lt;/H2&gt;
&lt;P&gt;Starting with &lt;STRONG&gt;Docker Engine 29.0&lt;/STRONG&gt;, &lt;STRONG&gt;containerd’s image store becomes the default storage backend on fresh installations&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Docker explicitly documents this behavior:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;“The containerd image store is the default storage backend for Docker Engine 29.0 and later on fresh installations.”&lt;/EM&gt;&lt;BR /&gt;&lt;A href="https://docs.docker.com/engine/storage/containerd/" data-tabster="{&amp;quot;restorer&amp;quot;:{&amp;quot;type&amp;quot;:1}}" target="_blank"&gt;Docker containerd image store documentation&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Key points on Linux:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Docker now delegates &lt;STRONG&gt;image and snapshot storage&lt;/STRONG&gt; to containerd&lt;/LI&gt;
&lt;LI&gt;containerd uses its own content store and snapshotters&lt;/LI&gt;
&lt;LI&gt;Docker’s traditional data-root setting &lt;STRONG&gt;no longer controls all container storage&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Docker Engine v29 was released on &lt;STRONG&gt;11 November 2025&lt;/STRONG&gt;, and this behavior is &lt;STRONG&gt;by design&lt;/STRONG&gt;, not a regression.&lt;/P&gt;
&lt;H2&gt;Where Disk Usage Goes on Linux&lt;/H2&gt;
&lt;P&gt;Docker’s daemon documentation clarifies the split:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Legacy storage (pre‑v29 or upgraded installs):&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;All data under /var/lib/docker&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Docker Engine v29 (containerd image store enabled):&lt;/STRONG&gt;
&lt;UL&gt;
&lt;LI&gt;Images &amp;amp; snapshots → /var/lib/containerd&lt;/LI&gt;
&lt;LI&gt;Other Docker data (volumes, configs, metadata) → /var/lib/docker&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Crucially:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;“The data-root option does not affect image and container data stored in /var/lib/containerd when using the containerd image store.”&lt;/EM&gt;&lt;BR /&gt;&lt;A href="https://docs.docker.com/engine/daemon/" data-tabster="{&amp;quot;restorer&amp;quot;:{&amp;quot;type&amp;quot;:1}}" target="_blank"&gt;Docker daemon data directory documentation&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;This explains why OS disk usage continues to grow even when data-root is set to a data disk.&lt;/P&gt;
&lt;H2&gt;Why the Old Configuration Worked Before&lt;/H2&gt;
&lt;P&gt;On earlier Docker versions, Docker fully managed image and snapshot storage. Configuring:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;{&lt;/P&gt;
&lt;P&gt;"data-root": "/mnt/docker-data"&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Was sufficient to redirect &lt;STRONG&gt;all container storage&lt;/STRONG&gt; off the OS disk.&lt;/P&gt;
&lt;P&gt;With Docker Engine v29:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;containerd owns image and snapshot storage&lt;/LI&gt;
&lt;LI&gt;data-root only affects Docker‑managed data&lt;/LI&gt;
&lt;LI&gt;OS disk growth after upgrades or reimages is &lt;STRONG&gt;expected behavior&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This aligns fully with Docker’s documented design changes.&lt;/P&gt;
&lt;H2&gt;Linux Workaround: Redirect containerd Storage&lt;/H2&gt;
&lt;P&gt;To restore the intended behavior on Linux, keeping&amp;nbsp;&lt;STRONG&gt;both Docker and containerd storage on the mounted data disk, &lt;/STRONG&gt;containerd’s storage path must also be redirected.&lt;/P&gt;
&lt;P&gt;A practical workaround is to relocate /var/lib/containerd using a symbolic link.&lt;/P&gt;
&lt;H3&gt;Example (Linux)&lt;/H3&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;sudo systemctl stop docker.socket docker containerd || true;&lt;/P&gt;
&lt;P&gt;sudo mkdir -p /mnt/docker-data /mnt/containerd;&lt;/P&gt;
&lt;P&gt;sudo rm -rf /var/lib/containerd;&lt;/P&gt;
&lt;P&gt;sudo ln -s /mnt/containerd /var/lib/containerd;&lt;/P&gt;
&lt;P&gt;echo "{\"data-root\": \"/mnt/docker-data\"}" | sudo tee /etc/docker/daemon.json;&lt;/P&gt;
&lt;P&gt;sudo systemctl daemon-reload;&lt;/P&gt;
&lt;P&gt;sudo systemctl start containerd docker'&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2&gt;What This Does&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Stops Docker and containerd&lt;/LI&gt;
&lt;LI&gt;Creates container storage directories on the mounted data disk&lt;/LI&gt;
&lt;LI&gt;Redirects /var/lib/containerd → /mnt/containerd&lt;/LI&gt;
&lt;LI&gt;Keeps Docker’s data-root at /mnt/docker-data&lt;/LI&gt;
&lt;LI&gt;Restarts services with a unified storage layout&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This workaround is effective because it &lt;STRONG&gt;explicitly accounts for containerd‑managed paths introduced in Docker Engine v29&lt;/STRONG&gt;, restoring the behavior that existed prior to the change.&lt;/P&gt;
&lt;H2&gt;Key Takeaways&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Docker Engine v29 introduces a &lt;STRONG&gt;fundamental storage architecture change on Linux&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;data-root alone is no longer sufficient&lt;/LI&gt;
&lt;LI&gt;OS disk growth after upgrades or reimages is &lt;STRONG&gt;expected&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;containerd storage must also be redirected&lt;/LI&gt;
&lt;LI&gt;The workaround aligns with Docker’s official documentation and design&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;References&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Docker daemon data directory&lt;BR /&gt;https://docs.docker.com/engine/daemon/&lt;/LI&gt;
&lt;LI&gt;containerd image store (Docker Engine v29)&lt;BR /&gt;https://docs.docker.com/engine/storage/containerd/&lt;/LI&gt;
&lt;LI&gt;Docker Engine v29 release notes&lt;BR /&gt;https://docs.docker.com/engine/release-notes/29/&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Mon, 23 Mar 2026 18:17:37 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/docker-engine-v29-on-linux-why-data-root-no-longer-prevents-os/m-p/4504862#M22466</guid>
      <dc:creator>vdivizinschi</dc:creator>
      <dc:date>2026-03-23T18:17:37Z</dc:date>
    </item>
    <item>
      <title>AI-102 Develop computer vision solutions in Azure (deprecated)</title>
      <link>https://techcommunity.microsoft.com/t5/azure/ai-102-develop-computer-vision-solutions-in-azure-deprecated/m-p/4504431#M22465</link>
      <description>&lt;P&gt;I have my AI-102 certification exam next week, but Microsoft Learn shows the following:&lt;BR /&gt;Develop computer vision solutions in Azure (deprecated)&lt;BR /&gt;Does that mean that section won't be covered on the exam?&lt;/P&gt;</description>
      <pubDate>Sat, 21 Mar 2026 22:05:03 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure/ai-102-develop-computer-vision-solutions-in-azure-deprecated/m-p/4504431#M22465</guid>
      <dc:creator>AnSoMo28</dc:creator>
      <dc:date>2026-03-21T22:05:03Z</dc:date>
    </item>
  </channel>
</rss>

