Recent Discussions
Dynamic hostpool scaling not working
We have set up an AVD dynamic host pool for testing. The scaling plan properly ensures that a host is created when needed. However, the host is no longer removed even after the rampdown. We observe that the total sessions counter gets stuck. If I log in with a user and then log out properly, the current sessions in the host pool overview are updated quickly. But, if I then go to Manage, Session Hosts, the total sessions on that host remain at 1. Only when I put the host in drain mode are the actual sessions updated. Still hosts are not removed. Anyone seen this before?1View0likes0CommentsDynamic hostpool sessions not updating
We have created a dynamic host pool in a test environment. We see that new hosts are being created based on the scaling plan. However, these are no longer being deleted. When we look at the status, we see that there are no active sessions, but when we zoom in on the session hosts, it shows that there is a session on two of the three hosts. The latter is incorrect, but it is likely the reason why scaling down is not taking place. Does anyone recognize this? Is there possibly a solution for this? Small addition: If I log in with a user and then log out properly, the current sessions in the host pool overview are updated quickly. However, if I then go to Manage, Session Hosts, the total sessions on that host remain at 1. When I now put the host in drinamode, only then are the actual sessions updated.5Views0likes0CommentsAzure RBAC Custom Role Best Practices or Common Build Patterns
As a platform admin, I want to grant application admins Contributor access while removing their ability to write or delete most Microsoft.Network resource types, with a few exceptions such as Private Endpoints, Network Interfaces, and Application Gateways. Based on the effective control plane permissions logic, we designed two custom roles. The first role is a duplicate of the Contributor role, but with Microsoft.Network//Write and Microsoft.Network//Delete added to notActions. The second role adds back specific Microsoft.Network operations using wildcarded resource types, such as Microsoft.Network/networkInterfaces/*. Application Admin Effective Permissions = Role 1 (Contributor - Microsoft.Network) + Role 2 (for example, Microsoft.Network/networkInterfaces/, Microsoft.Network/networkSecurityGroups/, Microsoft.Network/applicationGateways/write, etc.) I understand that Microsoft RBAC best practices recommend avoiding wildcard (*) operations. However, my team has found that building roles with individual operations is extremely tedious and time-consuming, especially when trying to understand the impact of each operation. Does anyone have suggestions for a simpler or more maintainable pattern for implementing this type of custom RBAC design?47Views0likes1CommentAKS on AzureLocal: KMSv1 -> KMSv2
Hey, quick question on AKS Arc — we're running moc-kms-plugin:0.2.172-official on an Arc-enabled AKS cluster on Azure Local and currently have KMSv1=true as a feature gate to keep encryption at rest working. KMSv1 is deprecated in 1.28+ and we want to migrate to KMSv2 before it gets removed. Since moc-kms-plugin is a Microsoft-managed component we can't just swap it out ourselves. A few questions: Does version 0.2.172 already support the KMSv2 gRPC API, or is that coming in a later release? Is there a supported migration path for AKS Arc specifically, or does this come automatically through a platform update? Any docs or internal guidance you can point us to? Thanks!11Views0likes0CommentsLegacy SSRS reports after upgrading Azure DevOps Server 2020 to 2022 or 25H2
We are currently planning an upgrade from Azure DevOps Server 2020 to Azure DevOps Server 2022 or 25H2, and one of our biggest concerns is reporting. We understand that Microsoft’s recommended direction is to move to Power BI based on Analytics / OData. However, for on-prem environments with a large number of existing SSRS reports, rebuilding everything from scratch would require significant time and effort. Since Warehouse and Analysis Services are no longer available in newer versions, we would like to understand how other on-prem teams are handling legacy SSRS reporting during and after the upgrade. Have you rebuilt your reports in Power BI, moved to another reporting approach, or found a practical way to keep existing SSRS reports available during the transition? Any real-world experience, lessons learned, or recommended approaches would be greatly appreciated.28Views0likes0CommentsAVD Environment- FSLogix Profile Login Failure – Write Protected Error
Hi, We are currently facing an issue with FSLogix user profiles in our environment and would appreciate your assistance in identifying and resolving the problem. Issue Description: Users are unable to log in successfully, and we are encountering the following error message: "No Create access → The media is write protected." Environment Details: Session Hosts: Microsoft Entra joined Users: Hybrid identities Profile Storage: Azure File Share Authentication Method: Identity-based access using Microsoft Entra Kerberos Configuration Details: We have assigned the FSLogix user group the role "Storage File Data SMB Share Contributor" on the Azure file share. Registry entry for Kerberose Ticket is also created. NTFS permissions have been configured via Azure Portal (Manage Access), granting Modify permissions to the FSLogix profile users on the file share folder. We can see that user profiles and corresponding VHDX files are being created successfully during login attempts. Problem Statement: Despite the successful creation of profiles and VHDX files, users are still unable to log in, and the error mentioned above persists. We would like your guidance on: Possible causes for the "write protected" error despite correct role and NTFS permissions. Any additional configurations or validations required for FSLogix with Entra Kerberos authentication. Recommended troubleshooting steps or logs we should review to isolate the issue. Please let us know if you need any additional logs, screenshots, or configuration details from our end. Looking forward to your support. Best regards, Ravi Yadav7Views0likes0CommentsRunning Commands Across VM Scale Set Instances Without RDP/SSH Using Azure CLI Run Command
If you’ve ever managed an Azure Virtual Machine Scale Set (VMSS), you’ve likely run into this situation: You need to validate something across all nodes, such as: Checking a configuration value Retrieving logs Applying a registry change Confirming runtime settings Running a quick diagnostic command And then you realize: You’re not dealing with two or three machines you’re dealing with 40… 80… or even hundreds of instances. The Traditional Approach (and Its Limitations) Historically, administrators would: Open RDP connections to Windows nodes SSH into Linux nodes Execute commands manually on each instance While this may work for a small number of machines, in real‑world environments such as: Azure Batch (user‑managed pools) Azure Service Fabric (classic clusters) VMSS‑based application tiers This approach quickly becomes: Operationally inefficient Time‑consuming Sometimes impossible Especially when: RDP or SSH ports are blocked Network Security Groups restrict inbound connectivity Administrative credentials are unavailable Network configuration issues prevent guest access Azure Run Command To address this, Azure provides a built‑in capability to execute commands inside virtual machines through the Azure control plane, without requiring direct guest OS connectivity. This feature is called Run Command. You can review the official documentation here: Run scripts in a Linux VM in Azure using action Run Commands - Azure Virtual Machines | Microsoft Learn Run scripts in a Windows VM in Azure using action Run Commands - Azure Virtual Machines | Microsoft Learn Run Command uses the Azure VM Agent installed on the virtual machine to execute PowerShell or shell scripts directly inside the guest OS. Because execution happens via the Azure control plane, you can run commands even when: RDP or SSH ports are blocked NSGs restrict inbound access Administrative user configuration is broken In fact, Run Command is specifically designed to troubleshoot and remediate virtual machines that cannot be accessed through standard remote access methods. Prerequisites & Restrictions. Before using Run Command, ensure the following: VM Agent installed and in Ready state Outbound connectivity from the VM to Azure public IPs over TCP 443 to return execution results. If outbound connectivity is blocked, scripts may run successfully but no output will be returned to the caller. Additional limitations include: Output limited to the last 4,096 bytes One script execution at a time per VM Interactive scripts are not supported Maximum execution time of 90 minutes Full list of restrictions and limitations are available here: https://learn.microsoft.com/en-us/azure/virtual-machines/windows/run-command?tabs=portal%2Cpowershellremove#restrictions Required Permissions (RBAC) Executing Run Command requires appropriate Azure RBAC permissions. Action Permission List available Run Commands Microsoft.Compute/locations/runCommands/read Execute Run Command Microsoft.Compute/virtualMachines/runCommand/action The execution permission is included in: Virtual Machine Contributor role (or higher) Users without this permission will be unable to execute remote scripts through Run Command. Azure CLI: az vm vs az vmss When using Azure CLI, you’ll encounter two similar‑looking commands that behave very differently. az vm run-command invoke Used for standalone VMs Also used for Flexible VM Scale Sets Targets VMs by name az vmss run-command invoke Used only for Uniform VM Scale Sets Targets instances by numeric instanceId (0, 1, 2, …) Example: az vmss run-command invoke --instance-id <id> Unlike standalone VM execution, VMSS instances must be referenced using the parameter "--instance-id" to identify which scale set instance will run the script. Important: Uniform vs Flexible VM Scale Sets This distinction is critical when automating Run Command execution. Uniform VM Scale Sets Instances are managed as identical replicas Each instance has a numeric instanceId Supported by az vmss run-command invoke Flexible VM Scale Sets Each instance is a first‑class Azure VM resource Instance identifiers are VM names, not numbers az vmss run-command invoke is not supported Must use az vm run-command invoke per VM To determine which orchestration mode your VMSS uses: az vmss show -g "${RG}" -n "${VMSS}" --query "orchestrationMode" -o tsv Windows vs Linux Targets Choose the appropriate command ID based on the guest OS: Windows VMs → RunPowerShellScript Linux VMs → RunShellScript Example Scenario - Retrieve Hostname From All VMSS Instances The following examples demonstrate how to retrieve the hostname from all VMSS instances using Azure CLI and Bash. Flexible VMSS, Bash (Azure CLI) RG="<ResourceGroup>" VMSS="<VMSSName>" SUBSCRIPTION_ID="<SubscriptionID>" az account set --subscription "${SUBSCRIPTION_ID}" VM_NAMES=$(az vmss list-instances \ -g "${RG}" \ -n "${VMSS}" \ --query "[].name" \ -o tsv) for VM in $VM_NAMES; do echo "Running on VM: $VM" az vm run-command invoke \ -g "${RG}" \ -n "$VM" \ --command-id RunShellScript \ --scripts "hostname" \ --query "value[0].message" \ -o tsv done Uniform VMSS, Bash (Azure CLI) RG="<ResourceGroup>" VMSS="<VMSSName>" SUBSCRIPTION_ID="<SubscriptionID>" az account set --subscription "${SUBSCRIPTION_ID}" INSTANCE_IDS=$(az vmss list-instances -g "${RG}" -n "${VMSS}" --query "[].instanceId" -o tsv) for ID in $INSTANCE_IDS; do echo "Running on instanceId: $ID" az vmss run-command invoke \ -g "${RG}" \ -n "${VMSS}" \ --instance-id "$ID" \ --command-id RunShellScript \ --scripts "hostname" \ --query "value[0].message" \ -o tsv done Summary Azure Run Command provides a scalable method to: Execute diagnostics Apply configuration changes Collect logs Validate runtime settings …across VMSS instances without requiring RDP or SSH connectivity. This significantly simplifies operational workflows in large‑scale compute environments such as: Azure Batch (user‑managed pools) Azure Service Fabric classic clusters VMSS‑based application tiers39Views0likes0Comments[Architecture Pattern] Scaling Sync-over-Async Edge Gateways by Bypassing Service Bus Sessions
Hi everyone, I wanted to share an architectural pattern and an open-source implementation we recently built to solve a major scaling bottleneck at the edge: bridging legacy synchronous HTTP clients to long-running asynchronous AI workers. The Problem: Stateful Bottlenecks at the Edge When dealing with slow AI generation tasks (e.g., 45+ seconds), standard REST APIs will drop the connection resulting in 504 Gateway Timeouts. The standard integration pattern here is Sync-over-Async. The Gateway accepts the HTTP request, drops a message onto Azure Service Bus, waits for the worker to reply, and maps the reply back to the open HTTP connection. However, the default approach is to use Service Bus Sessions for request-reply correlation. At scale, this introduces severe limitations: 1. Stateful Gateways: The Gateway pod must request an exclusive lock on the session. It becomes tightly coupled to that specific request. 2. Horizontal Elasticity is Broken: If a reply arrives, it must go to the specific pod holding the lock. Other idle pods cannot assist. 3. Hard Limits: A traffic spike easily exhausts the namespace concurrent session limits (especially on the Standard tier). The Solution: Stateless Filtered Topics To achieve true horizontal scale, the API Gateway layer must be 100% stateless. We bypassed Sessions entirely by pushing the routing logic down to the broker using a Filtered Topic Pattern. How it works: 1. The Gateway injects a CorrelationId property (e.g., Instance-A-Req-1) into the outbound request. 2. Instead of locking a session, the Gateway spins up a lightweight, dynamic subscription on a shared Reply Topic with a SQL Filter: CorrelationId = 'Instance-A-Req-1'. 3. The AI worker processes the task and drops the reply onto the shared topic with the same property. 4. The Azure Service Bus broker evaluates the SQL filter and pushes the message directly to the correct Gateway pod. No session locks. No implicit instance affinity. Complete horizontal scalability. If a pod crashes, its temporary subscription simply drops—preventing locked poison messages. Open Source Implementation Implementing dynamic Service Bus Administration clients and receiver lifecycles is complex, so I abstracted this pattern into a Spring Boot starter for the community. It handles all the dynamic subscription and routing logic under the hood, allowing developers to execute highly scalable Sync-over-Async flows with a single line of code returning a CompletableFuture. GitHub Repository: https://github.com/ShivamSaluja/sentinel-servicebus-starter Full Technical Write-up: https://dev.to/shivamsaluja/sync-over-async-bypassing-azure-service-bus-session-limits-for-ai-workloads-269d I would love to hear from other architects in this hub. Have you run into similar session exhaustion limits when building Edge API Gateways? Have you adopted similar stateless broker-side routing, or do you rely on sticky sessions at your load balancers?35Views0likes0CommentsProyecto Escolar Tecnológico
Estamos haciendo un trabajo de investigación sobre las nuevas tecnologías aplicadas a la gestión empresarial ya que estamos desarrollando un proyecto de software para el sector de odontología y me gustaría preguntarle a los expertos: ¿Qué tecnologías se consideran "el estándar de oro" o esenciales para aplicar en 2026, y que ustedes ya han utilizado?.9Views0likes0CommentsExcited to share my latest open-source project: KubeCost Guardian
After seeing how many DevOps teams struggle with Kubernetes cost visibility on Azure, I built a full-stack cost optimization platform from scratch. 𝗪𝗵𝗮𝘁 𝗶𝘁 𝗱𝗼𝗲𝘀: ✅ Real-time AKS cluster monitoring via Azure SDK ✅ Cost breakdown per namespace, node, and pod ✅ AI-powered recommendations generated from actual cluster state ✅ One-click optimization actions ✅ JWT-secured dashboard with full REST API 𝗧𝗲𝗰𝗵 𝗦𝘁𝗮𝗰𝗸: - React 18 + TypeScript + Vite - Tailwind CSS + shadcn/ui + Recharts - Node.js + Express + TypeScript - Azure SDK (@azure/arm-containerservice) - JWT Authentication + Azure Service Principal 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝗶𝘁 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁: Most cost tools show you generic estimates. KubeCost Guardian reads your actual VM size, node count, and cluster configuration to generate recommendations that are specific to your infrastructure not averages. For example, if your cluster has only 2 nodes with no autoscaler enabled, it immediately flags the HA risk and calculates exactly how much you'd save by switching to Spot instances based on your actual VM size. This project is fully open-source and built for the DevOps community. ⭐ GitHub: https://github.com/HlaliMedAmine/kubecost-guardian This project represents hours of hard work, and passion. I decided to make it open-source so everyone can benefit from it 🤝 ,If you find it useful, I’d really appreciate your support . Your support motivates me to keep building and sharing more powerful projects 👌. More exciting ideas are coming soon… stay tuned! 🔥.Pipeline Intelligence is live and open-source real-time Azure DevOps monitoring powered by AI .
Every DevOps team I've worked with had the same problem: Slow pipelines. Zero visibility. No idea where to start. So I stopped complaining and built the solution. So I built something about it. ⚡ Pipeline Intelligence is a full-stack Azure DevOps monitoring dashboard that: ✅ Connects to your real Azure DevOps organization via REST API ✅ Detects bottlenecks across all your pipelines automatically ✅ Calculates exactly how much time your team is wasting per month ✅ Uses Gemini AI to generate prioritized fixes with ready-to-paste YAML solutions ✅ JWT-secured, Docker-ready, and fully open-source Tech Stack: → React 18 + Vite + Tailwind CSS → Node.js + Express + Azure DevOps API v7 → Google Gemini 1.5 Flash → JWT Authentication + Docker 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝗶𝘁 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁? Most tools show you generic estimates. Pipeline Intelligence reads your actual cluster config, node count, and pipeline structure and gives you recommendations specific to your infrastructure. 🎯 This year, I set myself a personal challenge: Build and open-source a series of production-grade tools exclusively focused on Azure services tools that solve real problems for real DevOps teams. This project represents weeks of research, architecture decisions, and late-night debugging sessions. I'm sharing it with the community because I believe great tooling should be accessible to everyone not locked behind enterprise paywalls. If this resonates with you, I have one simple ask: 👉 A like, a comment, or a share takes 3 seconds but it helps this reach the DevOps engineers who need it most. Your support is what keeps me building. ❤️ GitHub: https://github.com/HlaliMedAmine/pipeline-intelligenceAzure VMs host (platform) metrics (not guest metrics) to the log analytics workspace ?
Hi Team, Can some one help me how to send Azure VMs host (platform) metrics (not guest metrics) to the log analytics workspace ? Earlier some years ago I used to do it, by clicking on “Diagnostic Settings”, but now if I go to “Diagnostic Settings” tab its asking me to enable guest level monitoring (guest level metrics I don’t want) and pointing to a Storage Account. I don’t see the option to send the these metrics to Log analytics workspace. I have around 500 azure VMs whose host (platform) metrics (not guest metrics) I want to send it to the log analytics workspace.41Views0likes1CommentBuilding a Production-Ready Azure Lighthouse Deployment Pipeline with EPAC
Recently I worked on an interesting project for an end-to-end Azure Lighthouse implementation. What really stood out to me was the combination of Azure Lighthouse, EPAC, DevOps, and workload identity federation. The deployment model was so compelling that I decided to build and validate the full solution hands-on in my own personal Azure tenants. The result is a detailed article that documents the entire journey, including pipeline design, implementation steps, and the scripts I prepared along the way. You can read the full article here87Views0likes1CommentAzure Key Vault Replication: Why Paired Regions Alone Don’t Guarantee Business Continuity
As customers modernize toward multi‑region architectures in Azure, one question comes up repeatedly: “If my region goes down, will Azure Key Vault continue to work without disruption?” The short answer: it depends on what you mean by “work.” Azure Key Vault provides strong durability and availability guarantees, but those guarantees are often misunderstood—especially when customers assume paired‑region replication equals full disaster recovery. In reality, Azure Key Vault replication is designed for survivability, not uninterrupted write access or customer‑controlled failover. This post explains: How Azure Key Vault replication actually works (per Microsoft Learn) Why paired‑region failover does not equal business continuity Two reference architectures that implement true multi‑region Key Vault availability, with Terraform How Azure Key Vault Replication Works (Per Microsoft Learn) Azure Key Vault includes multiple layers of Microsoft‑managed redundancy. In‑Region and Zone Resiliency Vault contents are replicated within the region. In regions that support availability zones, Key Vault is zone‑resilient by default. This protects against localized hardware or zone failures. Paired‑Region Replication If a Key Vault is deployed in a region with an Azure‑defined paired region, its contents are asynchronously replicated to that paired region. This replication is automatic and cannot be configured, observed, or tested by customers. Microsoft‑Managed Regional Failover If Microsoft declares a full regional outage, requests are automatically routed to the paired region. After failover, the vault operates in read‑only mode: ✅ Read secrets, keys, and certificates ✅ Perform cryptographic operations ❌ Create, update, rotate, or delete secrets, keys, or certificates This is a critical distinction. Paired‑region replication preserves access — not operational continuity. Why Paired‑Region Replication Is Not Business Continuity From a reliability and DR perspective, several limitations matter: Failover is Microsoft‑initiated, not customer‑controlled No write operations during regional failover No secret rotation or certificate renewal No way to test DR Accidental deletions replicate No point‑in‑time recovery without backups Microsoft Learn explicitly states that critical workloads may require custom multi‑region strategies beyond built‑in replication. For many customers, this means Azure Key Vault becomes a single‑region dependency in an otherwise multi‑region application design. The Multi‑Region Key Vault Pattern The two GitHub repositories below implement a common architectural shift: Multiple independent Key Vaults deployed in separate regions, with customer‑controlled replication and failover. Instead of relying on invisible platform replication, the vaults become first‑class, region‑scoped resources, aligned with application failover. Solution 1: Private, Locked‑Down Multi‑Region Key Vault Replication Repository: 👉 https://github.com/jclem2000/KeyVault-MultiRegion-Replication-Private Architecture Highlights Independent Key Vault per region Private Endpoints only No public network exposure Terraform‑based deployment Controlled replication using Event Based synchronization What This Enables ✅ Full read/write access during regional outages ✅ Continued secret rotation and certificate renewal ✅ Customer‑defined failover and RTO ✅ DR testing and validation ✅ Strong alignment with zero‑trust and regulated environments Trade‑offs Higher operational complexity Requires automation and application awareness of multiple vaults Solution 2: Low‑Cost Public Multi‑Region Key Vault Replication Repository: 👉 https://github.com/jclem2000/KeyVault-MultiRegion-Replication-Public Architecture Highlights Independent Key Vault per region Public endpoints Minimal networking dependencies Terraform‑based Controlled replication using Event Based synchronization Optimized for simplicity and cost What This Enables ✅ Full read/write availability in any region ✅ Clear and testable DR posture ✅ Lower cost than private endpoint designs ✅ Suitable for many non‑regulated workloads Trade‑offs Public exposure (mitigated via firewall rules, RBAC, and conditional access) Not appropriate for all compliance requirements Requires automation and application awareness of multiple vaults Azure Native Replication vs Customer‑Managed Multi‑Region Vaults Capability Azure Paired Region Multi‑Region Vaults Read access during outage ✅ ✅ Write access during outage ❌ ✅ Secret rotation during outage ❌ ✅ Customer‑controlled failover ❌ ✅ DR testing ❌ ✅ Isolation from accidental deletion ❌ ✅ Predictable RTO ❌ ✅ Azure Key Vault’s native replication optimizes for platform durability. The multi‑region pattern optimizes for application continuity. When to Use Each Approach Paired‑Region Replication Is Often Enough When: Secrets are mostly static Read‑only access during outages is acceptable RTO is flexible You prefer Microsoft‑managed recovery Multi‑Region Vaults Are Recommended When: Secrets or certificates rotate frequently Applications must remain writable during outages Deterministic failover is required DR testing is mandatory Regulatory or operational isolation is needed Closing Thoughts Azure Key Vault behaves exactly as documented on Microsoft Learn—but it’s important to be clear about what those guarantees mean. Paired‑region replication protects your data, not your ability to operate. If your application is designed to survive a regional outage, Key Vault must follow the same multi‑region design principles as the application itself. The reference architectures above show how to extend Azure’s native durability model into true operational resilience, without waiting for a platform‑level failover decision.191Views0likes0CommentsThe March 2026 Innovation Challenge Winners
For this round of the Innovation Challenge the organizations we sponsor helped over 15,000 developers get the skills it takes to build AI solutions on Azure. This program is grounded in Microsoft’s mission and designed to enable a diverse and qualified community of professional developers coming together to tackle big problems. We helped almost 1,000 people earn Microsoft certifications and Applied Skills credentials, and 300 participated in the invitation only March 2026 Innovation Challenge hackathon. Teams represented SHPE, Women in Cloud, Código Facilito, DIO, GenSpark, NASA Space Apps, Project Blue Mountain, and TechBridge. Check out the winning project to meet some of the best AI talent in our community and to get inspired about what we can build together! First place $10,000 Pebble. - AI Cognitive Load Companion Pebble. is named after a worry stone: something small and smooth you reach for when the world feels like too much. It's an AI cognitive support companion that turns overwhelming documents, tasks, and information into calm, structured clarity. Built for neurodivergent minds. Useful for everyone. Second place $5,000 The Living Memory Bridge We believe dementia represents the most extreme form of cognitive overload that exists. It is not just information overload. It is cognitive loss: the gradual erosion of the very tools people use to process the world. Every principle in the brief applies here in its most urgent form: simplified language, adaptive communication, calm and dignity-preserving interactions, personalized memory anchors, and support that meets people exactly where they are. Query to Insight Analytics CRAM CRAM is a natural language healthcare analytics platform built entirely on Azure that lets clinical and administrative staff query a patient database using plain English, no SQL required. Users type a question like "What are the top 10 conditions among diabetic patients?" and get back a written summary, a data table, and an auto-generated chart in seconds. Third place $2,500 ClearStep ClearStep is an action-first AI system designed to reduce decision overload in high-risk or confusing situations. Instead of only detecting risk, it tells users exactly what to do next. The core innovation is architectural: model output is not trusted. Every response is enforced by a validation layer that guarantees structure, corrects model errors, and prevents unsafe or misleading outputs from reaching the user. DataTalk Our platform enables seamless data ingestion from Excel, CSV, SharePoint, and OneDrive, processes it through a two-layer analytical pipeline powered by DuckDB, and orchestrates four specialized AI agents that work as a team: understanding intent, reading data structure, generating and self-correcting SQL, and enforcing security and auditability at every step RAGulator AI Governance Engine Advanced, governed, and traceable RAG (Retrieval-Augmented Generation) system for international trade. RAGulator is a 100% functional solution that unifies the Azure intelligence ecosystem to deliver grounded responses with immutable bibliographic citations.1.1KViews3likes0CommentsHow on god's green earth do you buy an API key? Bing custom search API.
I just want to give Microsoft money. I want to buy an API key for the Bing custom search. How do I do this? I get to Production and click "Click to issue paid tier key" and I keep getting the same god-awful """ Could not create the marketplace item Oops! Could not create the marketplace item Gallery item is required, no gallery item is provided. """ in the Azure marketplace. I just want to spend money. How do I do that?16Views0likes0CommentsRC4 Deprecating by April
I’m reviewing our Seamless SSO setup and noticed that the AzureADSSOAcc account is still using RC4 (encryption type 0x17) from Kerberos event logs. I have a few questions regarding this: Why does AzureADSSOAcc still default to RC4 instead of AES, even when the domain supports AES? With Microsoft disabling RC4 (April updates), will AzureADSSOAcc automatically switch to AES? If it does not switch automatically, what is the recommended way to force it to use AES? Is running Update-AzureADSSOForest (key rotation) sufficient, and does it cause any downtime or impact to Seamless SSO? I want to make sure we transition to AES safely without breaking SSO for users. Any guidance or real-world experience would be appreciated.494Views0likes2CommentsGraphic issue on single session host personal avd
We recently deployed single session host with azure gallery image(windows1125H2enterprise+m365apps) and random users are facing graphic issue on the avd,screen fully get blue line unable to see anything on the display,how to resolve this?75Views0likes2CommentsUninstalling Remote Desktop client closes users' Windows App connections
We have our users working from Windows App now to meet the 3/27 out of support date. We are beginning to uninstall the Remote Desktop from their laptops and are finding it closes active Windows App connections on uninstall (of Remote Desktop). That is less than ideal. Looking to see if any way around that, but wondered if others had seen the same?85Views0likes2CommentsDetecting ACI IP Drift and Auto-Updating Private DNS (A + PTR) with Event Grid + Azure Functions
Solution Author Aditya_AzureNinja , Chiragsharma30 Solution Version v1.0 TL;DR Azure Container Instances (ACI) container groups can be recreated/updated over time and may receive new private IPs, which can cause DNS mismatches if forward and reverse records aren’t updated. This post shares an event-driven pattern that detects ACI IP drift and automatically reconciles Private DNS A (forward) and PTR (reverse) records using Event Grid + Azure Functions. Key requirement: Event delivery is at-least-once, so the solution must be idempotent. Problem statement In hub-and-spoke environments using per-spoke Private DNS zones for isolation, ACI workloads created/updated/deleted over time can receive new private IPs. We need to ensure: Forward lookup: aci-name.<spoke-zone> (A record) → current ACI private IP Reverse lookup: IP → aci-name.<spoke-zone> (PTR record) Two constraints drive this design: Azure Private DNS auto-registration is VM-only and does not create PTR records, so ACI needs explicit A/PTR record management. Reverse DNS is scoped to the VNet (reverse zone must be linked to the querying VNet, otherwise reverse lookup returns NXDOMAIN). Design principle: This solution was designed with the following non‑negotiable engineering goals: Event‑driven DNS updates must be triggered directly from resource lifecycle events, not polling or scheduled jobs. Container creation, restart, and deletion are the only reliable sources of truth for IP changes in ACI. Idempotent Azure Event Grid delivers events with at‑least‑once semantics. The system must safely process duplicate events without creating conflicting DNS records or failing on retries. Stateless The automation must not rely on in‑memory or persisted state to determine correctness. DNS itself is treated as the baseline state, allowing functions to scale, restart, and replay events without drift or dependency on prior executions. Clear failure modes DNS reconciliation failures must be explicit and observable. If DNS updates fail, the function invocation must fail loudly so the issue is visible, alertable, and actionable—never silently ignored. Components Event Grid subscriptions (filtered to ACI container group lifecycle events) Azure Function App (Python) with System Assigned Managed Identity Private DNS forward zone (A records) Private DNS reverse zone (PTR records) Supporting infra (typical): Storage account (function artifacts / operational needs) Application Insights + Log Analytics (observability) Event-driven flow ACI container group is created/updated/deleted. Event Grid emits a lifecycle event (delivery can be repeated). Function is triggered and reads the current ACI private IP. Function reconciles DNS: Upsert A record to current IP Upsert PTR record to FQDN Remove stale PTR(s) for hostname/IP as needed Function logs reconciliation outcome (updated vs no-op). Architecture overview (INFRA) This follows the“Event-driven registration” approach: Event Grid → Azure Function that reconciles DNS on ACI lifecycle events. RBAC at a glance (Managed Identity) Role Scope Purpose Storage Blob Data Owner Function App deployment storage account Access function artifacts and operational blobs (required because shared key access is disabled). Reader Each ACI workload resource group Read container group state and determine the current private IP. Private DNS Zone Contributor Private DNS forward zone(s) Create, update, and delete A records for ACI hostnames. Private DNS Zone Contributor Private DNS reverse zone(s) Create, update, and clean up PTR records for ACI IPs. Monitoring Metrics Publisher (optional) Data Collection Rule (DCR) Upload structured IP‑drift events to Log Analytics via the ingestion API. --- --- Architecture overview (APP) Event‑Driven DNS Reconciliation for Azure Container Instances 1. Event contract: what the function receives Azure Event Grid delivers events using a consistent envelope (Event Grid schema). Each event includes, at a minimum: topic subject id eventType eventTime data dataVersion metadataVersion In Azure Functions, the Event Grid trigger binding is the recommended way to receive these events directly. Why the subject field matters The subject field typically contains the ARM resource ID path of the affected resource. This solution relies on subject to: verify that the event is for an ACI container group (Microsoft.ContainerInstance/containerGroups) extract: subscription ID resource group name container group name Using subject avoids dependence on publisher‑specific payload fields and keeps parsing fast, deterministic, and resilient. 2. Subscription design: filter hard, process little The solution follows a strict runbook pattern: subscribe only to ARM lifecycle events filter aggressively so only ACI container groups are included trigger reconciliation only on meaningful state transitions Recommended Event Grid event types Microsoft.Resources.ResourceWriteSuccess (create / update / stop state changes) Microsoft.Resources.ResourceDeleteSuccess (container group deletion) Microsoft.Resources.ResourceActionSuccess (optional) (restart / start / stop actions, environment‑dependent) This keeps the Function App simple, predictable, and low‑noise. 3. Application design: two functions, one contract The application is intentionally split into authoritative mutation and read‑only validation. Component A — DNS Reconciler (authoritative writer) A thin Python v2 model wrapper: receives the Event Grid event validates this is an ACI container group event parses identifiers from the ARM subject resolves DNS configuration from a JSON mapping (environment variable) delegates DNS mutation to a deterministic worker script DNS changes are not implemented inline in Python. Instead, the function: constructs a controlled set of environment variables invokes a worker script (/bin/bash) via subprocess streams stdout/stderr into function logs treats non‑zero exit codes as hard failures This thin wrapper + deterministic worker pattern isolates DNS correctness logic while keeping the event handler stable and testable. Component B — IP Drift Tracker (stateless observer) The drift tracker is a read‑only, stateless validator designed for correctness monitoring. It: parses identifiers from the event subject exits early on delete events (nothing to validate) reads the live ACI private IP using the Azure SDK reads the current DNS A record baseline compares live vs DNS state and emits drift telemetry Core comparison logic No DNS record exists → emit first_seen DNS record matches live IP → emit no_change DNS record differs from live IP → emit drift_detected (old/new IP) Optionally, drift events can be shipped to Log Analytics using DCR‑based ingestion. 4. DNS Reconciler: execution flow Step 1 — Early filtering Reject any event whose subject does not contain: Microsoft.ContainerInstance/containerGroups. This avoids unnecessary processing and ensures strict contract enforcement. Step 2 — ARM subject parsing The function splits the subject path and extracts: resource group container group name This approach is fast, robust, and avoids publisher‑specific schema dependencies. Step 3 — Zone configuration resolution DNS configuration is resolved from a JSON map stored in an environment variable. If no matching configuration exists for the resource group: the function logs the condition exits without error Why this matters This keeps the solution multi‑environment without duplicating deployments. Only configuration changes — not code — are required. Step 4 — Delegation to worker logic The function constructs a deterministic runtime context and invokes the worker: forward zone name reverse zone name(s) container group name current private IP TTL and execution flags The worker performs reconciliation and exits with explicit success or failure. 5. What “reconciliation” actually means Reconciliation follows clear, idempotent semantics. Create / Update events Upsert A record if record exists and matches current IP → no‑op else → create or overwrite with new IP Upsert PTR record compute PTR name using IP octets and reverse zone alignment create or overwrite PTR to hostname.<forward-zone> Delete events delete the A record for the hostname scan PTR record sets: remove targets matching the hostname delete record set if empty All operations are safe to repeat. 6. Why IP drift tracking is separate DNS reconciliation enforces correctness at event time, but drift can still occur due to: manual DNS edits partial failures delete / recreate race conditions unexpected redeployments or restarts The drift tracker exists as a continuous correctness validator, not as a repair mechanism. This separation keeps responsibilities clear: Reconciler → fixes state Drift tracker → observes and reports state 7. Observability: correctness vs runtime health There is an important distinction: Runtime health container crashes image pull failures restarts platform events (visible in standard ACI / Container logs) DNS correctness A record != live IP missing PTR records stale reverse mappings The IP Drift Tracker provides this correctness layer, which complements — not replaces — runtime monitoring. 8. Engineering constraints that shape the design At‑least‑once delivery → idempotency Event Grid delivery must be treated as at‑least‑once. Every reconciliation action is safe to execute multiple times. Explicit failure behavior If the worker script returns a non‑zero exit code: the function invocation fails the failure is visible and alertable incorrect DNS does not silently persist
Events
in 18 hours
Join our upcoming live webcast for a transparent discussion about this recent Azure service incident — led by our engineering teams.
Network degradation within East US AZ-02
Tracking ID: DG_Z-S08...
Thursday, Apr 23, 2026, 09:30 AM PDTOnline
0likes
2Attendees
0Comments
Recent Blogs
- 6 MIN READAzure is evolving to better support secure‑by‑default cloud architectures. Starting with API version 2025‑07‑01 (released after March 31, 2026), newly created virtual networks now default to u...Apr 22, 202657Views0likes0Comments
- Co-authors: Jie Su, Abhinav Dua, Mukthar Ahmed, Dhruv Joshi In a previous post, we shared how Azure Automated VM Recovery works to minimize virtual machine downtime through a three-stage approach: ...Apr 22, 2026108Views0likes0Comments