Forum Widgets
Latest Discussions
Detecting ACI IP Drift and Auto-Updating Private DNS (A + PTR) with Event Grid + Azure Functions
Solution Author Aditya_AzureNinja , Chiragsharma30 Solution Version v1.0 TL;DR Azure Container Instances (ACI) container groups can be recreated/updated over time and may receive new private IPs, which can cause DNS mismatches if forward and reverse records aren’t updated. This post shares an event-driven pattern that detects ACI IP drift and automatically reconciles Private DNS A (forward) and PTR (reverse) records using Event Grid + Azure Functions. Key requirement: Event delivery is at-least-once, so the solution must be idempotent. Problem statement In hub-and-spoke environments using per-spoke Private DNS zones for isolation, ACI workloads created/updated/deleted over time can receive new private IPs. We need to ensure: Forward lookup: aci-name.<spoke-zone> (A record) → current ACI private IP Reverse lookup: IP → aci-name.<spoke-zone> (PTR record) Two constraints drive this design: Azure Private DNS auto-registration is VM-only and does not create PTR records, so ACI needs explicit A/PTR record management. Reverse DNS is scoped to the VNet (reverse zone must be linked to the querying VNet, otherwise reverse lookup returns NXDOMAIN). Design principle: This solution was designed with the following non‑negotiable engineering goals: Event‑driven DNS updates must be triggered directly from resource lifecycle events, not polling or scheduled jobs. Container creation, restart, and deletion are the only reliable sources of truth for IP changes in ACI. Idempotent Azure Event Grid delivers events with at‑least‑once semantics. The system must safely process duplicate events without creating conflicting DNS records or failing on retries. Stateless The automation must not rely on in‑memory or persisted state to determine correctness. DNS itself is treated as the baseline state, allowing functions to scale, restart, and replay events without drift or dependency on prior executions. Clear failure modes DNS reconciliation failures must be explicit and observable. If DNS updates fail, the function invocation must fail loudly so the issue is visible, alertable, and actionable—never silently ignored. Components Event Grid subscriptions (filtered to ACI container group lifecycle events) Azure Function App (Python) with System Assigned Managed Identity Private DNS forward zone (A records) Private DNS reverse zone (PTR records) Supporting infra (typical): Storage account (function artifacts / operational needs) Application Insights + Log Analytics (observability) Event-driven flow ACI container group is created/updated/deleted. Event Grid emits a lifecycle event (delivery can be repeated). Function is triggered and reads the current ACI private IP. Function reconciles DNS: Upsert A record to current IP Upsert PTR record to FQDN Remove stale PTR(s) for hostname/IP as needed Function logs reconciliation outcome (updated vs no-op). Architecture overview (INFRA) This follows the“Event-driven registration” approach: Event Grid → Azure Function that reconciles DNS on ACI lifecycle events. RBAC at a glance (Managed Identity) Role Scope Purpose Storage Blob Data Owner Function App deployment storage account Access function artifacts and operational blobs (required because shared key access is disabled). Reader Each ACI workload resource group Read container group state and determine the current private IP. Private DNS Zone Contributor Private DNS forward zone(s) Create, update, and delete A records for ACI hostnames. Private DNS Zone Contributor Private DNS reverse zone(s) Create, update, and clean up PTR records for ACI IPs. Monitoring Metrics Publisher (optional) Data Collection Rule (DCR) Upload structured IP‑drift events to Log Analytics via the ingestion API. --- --- Architecture overview (APP) Event‑Driven DNS Reconciliation for Azure Container Instances 1. Event contract: what the function receives Azure Event Grid delivers events using a consistent envelope (Event Grid schema). Each event includes, at a minimum: topic subject id eventType eventTime data dataVersion metadataVersion In Azure Functions, the Event Grid trigger binding is the recommended way to receive these events directly. Why the subject field matters The subject field typically contains the ARM resource ID path of the affected resource. This solution relies on subject to: verify that the event is for an ACI container group (Microsoft.ContainerInstance/containerGroups) extract: subscription ID resource group name container group name Using subject avoids dependence on publisher‑specific payload fields and keeps parsing fast, deterministic, and resilient. 2. Subscription design: filter hard, process little The solution follows a strict runbook pattern: subscribe only to ARM lifecycle events filter aggressively so only ACI container groups are included trigger reconciliation only on meaningful state transitions Recommended Event Grid event types Microsoft.Resources.ResourceWriteSuccess (create / update / stop state changes) Microsoft.Resources.ResourceDeleteSuccess (container group deletion) Microsoft.Resources.ResourceActionSuccess (optional) (restart / start / stop actions, environment‑dependent) This keeps the Function App simple, predictable, and low‑noise. 3. Application design: two functions, one contract The application is intentionally split into authoritative mutation and read‑only validation. Component A — DNS Reconciler (authoritative writer) A thin Python v2 model wrapper: receives the Event Grid event validates this is an ACI container group event parses identifiers from the ARM subject resolves DNS configuration from a JSON mapping (environment variable) delegates DNS mutation to a deterministic worker script DNS changes are not implemented inline in Python. Instead, the function: constructs a controlled set of environment variables invokes a worker script (/bin/bash) via subprocess streams stdout/stderr into function logs treats non‑zero exit codes as hard failures This thin wrapper + deterministic worker pattern isolates DNS correctness logic while keeping the event handler stable and testable. Component B — IP Drift Tracker (stateless observer) The drift tracker is a read‑only, stateless validator designed for correctness monitoring. It: parses identifiers from the event subject exits early on delete events (nothing to validate) reads the live ACI private IP using the Azure SDK reads the current DNS A record baseline compares live vs DNS state and emits drift telemetry Core comparison logic No DNS record exists → emit first_seen DNS record matches live IP → emit no_change DNS record differs from live IP → emit drift_detected (old/new IP) Optionally, drift events can be shipped to Log Analytics using DCR‑based ingestion. 4. DNS Reconciler: execution flow Step 1 — Early filtering Reject any event whose subject does not contain: Microsoft.ContainerInstance/containerGroups. This avoids unnecessary processing and ensures strict contract enforcement. Step 2 — ARM subject parsing The function splits the subject path and extracts: resource group container group name This approach is fast, robust, and avoids publisher‑specific schema dependencies. Step 3 — Zone configuration resolution DNS configuration is resolved from a JSON map stored in an environment variable. If no matching configuration exists for the resource group: the function logs the condition exits without error Why this matters This keeps the solution multi‑environment without duplicating deployments. Only configuration changes — not code — are required. Step 4 — Delegation to worker logic The function constructs a deterministic runtime context and invokes the worker: forward zone name reverse zone name(s) container group name current private IP TTL and execution flags The worker performs reconciliation and exits with explicit success or failure. 5. What “reconciliation” actually means Reconciliation follows clear, idempotent semantics. Create / Update events Upsert A record if record exists and matches current IP → no‑op else → create or overwrite with new IP Upsert PTR record compute PTR name using IP octets and reverse zone alignment create or overwrite PTR to hostname.<forward-zone> Delete events delete the A record for the hostname scan PTR record sets: remove targets matching the hostname delete record set if empty All operations are safe to repeat. 6. Why IP drift tracking is separate DNS reconciliation enforces correctness at event time, but drift can still occur due to: manual DNS edits partial failures delete / recreate race conditions unexpected redeployments or restarts The drift tracker exists as a continuous correctness validator, not as a repair mechanism. This separation keeps responsibilities clear: Reconciler → fixes state Drift tracker → observes and reports state 7. Observability: correctness vs runtime health There is an important distinction: Runtime health container crashes image pull failures restarts platform events (visible in standard ACI / Container logs) DNS correctness A record != live IP missing PTR records stale reverse mappings The IP Drift Tracker provides this correctness layer, which complements — not replaces — runtime monitoring. 8. Engineering constraints that shape the design At‑least‑once delivery → idempotency Event Grid delivery must be treated as at‑least‑once. Every reconciliation action is safe to execute multiple times. Explicit failure behavior If the worker script returns a non‑zero exit code: the function invocation fails the failure is visible and alertable incorrect DNS does not silently persistChiragsharma30Apr 01, 2026Microsoft128Views2likes0CommentsHelp wanted: Refresh articles in Azure Architecture Center (AAC)
I’m the Project Manager for architecture review boards (ARBs) in the Azure Architecture Center (AAC). We’re looking for subject matter experts to help us improve the freshness of the AAC, Cloud Adoption Framework (CAF), and Well-Architected Framework (WAF) repos. This opportunity is currently limited to Microsoft employees only. As an ARB member, your main focus is to review, update, and maintain content to meet quarterly freshness targets. Your involvement directly impacts the quality, relevance, and direction of Azure Patterns & Practices content across AAC, CAF, and WAF. The content in these repos reaches almost 900,000 unique readers per month, so your time investment has a big, global impact. The expected commitment is 4-6 hours per month, including attendance at weekly or bi-weekly sync meetings. Become an ARB member to gain: Increased visibility and credibility as a subject‑matter expert by contributing to Microsoft‑authored guidance used by customers and partners worldwide. Broader internal reach and networking without changing roles or teams. Attribution on Microsoft Learn articles that you own. Opportunity to take on expanded roles over time (for example, owning a set of articles, mentoring contributors, or helping shape ARB direction). We’re recruiting new members across several ARBs. Our highest needs are in the Web ARB, Containers ARB, and Data & Analytics ARB: The Web ARB focuses on modern web application architecture on Azure—App Service and PaaS web apps, APIs and API Management, ingress and networking (Application Gateway, Front Door, DNS), security and identity, and designing for reliability, scalability, and disaster recovery. The Containers ARB focuses on containerized and Kubernetes‑based architectures—AKS design and operations, networking and ingress, security and identity, scalability, and reliability for production container platforms. The Data & Analytics ARB focuses on data platform and analytics architectures—data ingestion and integration, analytics and reporting, streaming and real‑time scenarios, data security and governance, and designing scalable, reliable data solutions on Azure. We’re also looking for people to take ownership of other articles across AAC, CAF, and WAF. These articles span many areas, including application and solution architectures, containers and compute, networking and security, governance and observability, data and integration, and reliability and operational best practices. You don’t need to know everything—deep expertise in one or two areas and an interest in keeping Azure architecture guidance accurate and current is what matters most. Please reply to this post if you’re interested in becoming an ARB member, and I’ll follow up with next steps. If you prefer, you can email me at v-jodimartis@microsoft.com. Thanks! 🙂jodimartisMar 17, 2026Microsoft46Views0likes0CommentsAdmin‑On‑Behalf‑Of issue when purchasing subscription
Hello everyone! I want to reach out to you on the internet and ask if anyone has the same issue as we do when creating PAYG Azure subscriptions in a customer's tenant, in which we have delegated access via GDAP through PartnerCenter. It is a bit AI formatted question. When an Azure NCE subscription is created for a customer via an Indirect Provider portal, the CSP Admin Agent (foreign principal) is not automatically assigned Owner on the subscription. As a result: AOBO (Admin‑On‑Behalf‑Of) does not activate The subscription is invisible to the partner when accessing Azure via Partner Center service links The partner cannot manage and deploy to a subscription they just provided This breaks the expected delegated administration flow. Expected Behavior For CSP‑created Azure subscriptions: The CSP Admin Agent group should automatically receive Owner (or equivalent) on the subscription AOBO should work immediately, without customer involvement The partner should be able to see the subscription in Azure Portal and deploy resources Actual Behavior Observed For Azure NCE subscriptions created via an Indirect Provider: No RBAC assignment is created for the foreign AdminAgent group The subscription is visible only to users inside the customer tenant Partner Center role (Admin Agent foreign group) is present, but without Azure RBAC. Required Customer Workaround For each new Azure NCE subscription, the customer must: Sign in as Global Admin Use “Elevate access to manage all Azure subscriptions and management groups” Assign themselves Owner on the subscription Manually assign Owner to the partner’s foreign AdminAgent group Only after this does AOBO start working. Example Partner tries to access the subscription: https://portal.azure.com/#@customer.onmicrosoft.com/resource/subscriptions/<subscription-id>/overview But there is no subscription visible "None of the entries matched the given filter" https://learn.microsoft.com/en-us/azure/role-based-access-control/elevate-access-global-admin?tabs=azure-portal%2Centra-audit-logs#step-1-elevate-access-for-a-global-administrator from the customer's global admin. and manual RBAC fix in Cloud console: az role assignment create \ --assignee-object-id "<AdminAgent-Foreign-Group-ObjectId>" \ --role "Owner" \ --scope "/subscriptions/<subscription-id>" \ --assignee-principal-type "ForeignGroup" After this, AOBO works as expected for delegated administrators (foreign user accounts). Why This Is a Problem Partners sell Azure subscriptions that they cannot access Forces resources from customers to involvement from customers Breaks delegated administration principles For Indirect CSPs managing many tenants, this is a decent operational blocker. Key Question to Microsoft / Community Does anyone else struggle with this? Is this behavior by design for Azure NCE + Indirect CSP? Am I missing some point of view on why not to do it in the suggested way?ivokoFeb 11, 2026Copper Contributor71Views0likes0Comments[Design Pattern] Handling race conditions and state in serverless data pipelines
Hello community, I recently faced a tricky data engineering challenge involving a lot of Parquet files (about 2 million records) that needed to be ingested, transformed, and split into different entities. The hard part wasn't the volume, but the logic. We needed to generate globally unique, sequential IDs for specific columns while keeping the execution time under two hours. We were restricted to using only Azure Functions, ADF, and Storage. This created a conflict: we needed parallel processing to meet the time limit, but parallel processing usually breaks sequential ID generation due to race conditions on the counters. I documented the three architecture patterns we tested to solve this: Sequential processing with ADF (Safe, but failed the 2-hour time limit). 2. Parallel processing with external locking/e-tags on Table Storage (Too complex and we still hit issues with inserts). 3. A "Fan-Out/Fan-In" pattern using Azure Durable Functions and Durable Entities. We ended up going with Durable Entities. Since they act as stateful actors, they allowed us to handle the ID counter state sequentially in memory while the heavy lifting (transformation) ran in parallel. It solved the race condition issue without killing performance. I wrote a detailed breakdown of the logic and trade-offs here if anyone is interested in the implementation details: https://medium.com/@yahiachames/data-ingestion-pipeline-a-data-engineers-dilemma-and-azure-solutions-7c4b36f11351 I am curious if others have used Durable Entities for this kind of ETL work, or if you usually rely on an external database sequence to handle ID generation in serverless setups? Thanks, ChameseddineChameseddineDec 15, 2025Copper Contributor125Views0likes1CommentUnderstanding Azure AD Tenants, Users, Groups, and Roles: A Practical Guide
As cloud adoption continues to shape modern IT infrastructures, Microsoft Azure Active Directory (Azure AD)—now part of Microsoft Entra ID—has become one of the most essential identity and access management (IAM) solutions for organizations. Whether you’re setting up a brand-new cloud environment or managing a hybrid workforce, understanding how Azure AD tenants, users, groups, and roles work is fundamental to keeping your environment secure, organized, and scalable. This guide breaks down each of these components in simple, practical terms, helping you gain the confidence to manage Azure identity services effectively. https://dellenny.com/understanding-azure-ad-tenants-users-groups-and-roles-a-practical-guide/216Views0likes0CommentsHow to Implement Azure AD Conditional Access Policies Step-by-Step
In today’s cloud-first world, identity is the new security perimeter. With employees logging in from different devices, locations, and networks, traditional access control is no longer enough. This is where Azure AD (now Microsoft Entra ID) Conditional Access comes in. It allows organizations to enforce automated decision-making about who can access what, under which conditions, and using which devices. If you’ve ever wondered how to configure Conditional Access the right way, without breaking user access or causing downtime, this guide walks you through the process https://dellenny.com/how-to-implement-azure-ad-conditional-access-policies-step-by-step/124Views0likes0CommentsManaging Azure AD Identity Protection: Detecting and Mitigating Risky Sign-ins
In today’s digital landscape, securing user identities is more critical than ever. Organizations leveraging cloud services, especially Microsoft Azure, face an increasing number of identity-based threats, including account compromise, phishing attacks, and unauthorized access. Azure Active Directory (Azure AD) Identity Protection provides a robust set of tools to help IT teams detect, investigate, and mitigate risky sign-ins effectively. In this blog, we’ll explore how to manage Azure AD Identity Protection, detect risky sign-ins, and implement strategies to minimize security risks. https://dellenny.com/managing-azure-ad-identity-protection-detecting-and-mitigating-risky-sign-ins/87Views0likes0CommentsAzure Enterprise-Scale Landing Zone Building a Future
In today’s fast-paced digital landscape, enterprises are under constant pressure to innovate, scale efficiently, and maintain governance and security across their cloud environments. Microsoft Azure’s Enterprise-ScalezahidtimonNov 14, 2025Copper Contributor158Views1like0CommentsThe Role of a Software Architect in Modern Teams
In today’s fast-moving technology landscape, the role of a software architect is more important — and more nuanced — than ever before. Far from being just the “technical visionary,” modern software architects serve as bridge builders between technology, business goals, and people. They ensure that software systems are scalable, reliable, and aligned with the long-term vision of the organization. Let’s explore how this role has evolved and why it’s so critical in modern teams https://dellenny.com/the-role-of-a-software-architect-in-modern-teams/83Views0likes0CommentsWhat Is Software Architecture? A Practical Definition
If you’ve ever worked on a software project that grew beyond a few files, you’ve likely run into a question that every developer eventually faces: How should this be structured? That’s where software architecture comes in. https://dellenny.com/what-is-software-architecture-a-practical-definition/74Views0likes0Comments
Tags
- azure12 Topics
- Architecture6 Topics
- Site Recovery2 Topics
- application gateway2 Topics
- AGIC2 Topics
- security1 Topic
- nsg1 Topic
- best practices1 Topic
- routing1 Topic
- Azure Remote Connection1 Topic