Forum Discussion

Chiragsharma30's avatar
Apr 01, 2026

Detecting ACI IP Drift and Auto-Updating Private DNS (A + PTR) with Event Grid + Azure Functions

Solution Author

Aditya_AzureNinja​ , Chiragsharma30​ 

Solution Version v1.0

TL;DR

Azure Container Instances (ACI) container groups can be recreated/updated over time and may receive new private IPs, which can cause DNS mismatches if forward and reverse records aren’t updated. This post shares an event-driven pattern that detects ACI IP drift and automatically reconciles Private DNS A (forward) and PTR (reverse) records using Event Grid + Azure Functions.


Key requirement: Event delivery is at-least-once, so the solution must be idempotent.

Problem statement

In hub-and-spoke environments using per-spoke Private DNS zones for isolation, ACI workloads created/updated/deleted over time can receive new private IPs. We need to ensure:

  • Forward lookup: aci-name.<spoke-zone> (A record) → current ACI private IP
  • Reverse lookup: IP → aci-name.<spoke-zone> (PTR record)

Two constraints drive this design:

  • Azure Private DNS auto-registration is VM-only and does not create PTR records, so ACI needs explicit A/PTR record management.
  • Reverse DNS is scoped to the VNet (reverse zone must be linked to the querying VNet, otherwise reverse lookup returns NXDOMAIN).

 

Design principle:

This solution was designed with the following non‑negotiable engineering goals:

  • Event‑driven
    DNS updates must be triggered directly from resource lifecycle events, not polling or scheduled jobs. Container creation, restart, and deletion are the only reliable sources of truth for IP changes in ACI.
  • Idempotent
    Azure Event Grid delivers events with at‑least‑once semantics. The system must safely process duplicate events without creating conflicting DNS records or failing on retries. 
  • Stateless
    The automation must not rely on in‑memory or persisted state to determine correctness. DNS itself is treated as the baseline state, allowing functions to scale, restart, and replay events without drift or dependency on prior executions. 
  • Clear failure modes
    DNS reconciliation failures must be explicit and observable. If DNS updates fail, the function invocation must fail loudly so the issue is visible, alertable, and actionable—never silently ignored.

Components

  • Event Grid subscriptions (filtered to ACI container group lifecycle events)
  • Azure Function App (Python) with System Assigned Managed Identity
  • Private DNS forward zone (A records)
  • Private DNS reverse zone (PTR records)
  • Supporting infra (typical):
    • Storage account (function artifacts / operational needs)
    • Application Insights + Log Analytics (observability)

Event-driven flow

  1. ACI container group is created/updated/deleted.
  2. Event Grid emits a lifecycle event (delivery can be repeated).
  3. Function is triggered and reads the current ACI private IP.
  4. Function reconciles DNS:
    • Upsert A record to current IP
    • Upsert PTR record to FQDN
    • Remove stale PTR(s) for hostname/IP as needed
  5. Function logs reconciliation outcome (updated vs no-op).

Architecture overview (INFRA)

This follows the“Event-driven registration” approach: Event Grid → Azure Function that reconciles DNS on ACI lifecycle events.

RBAC at a glance (Managed Identity)

RoleScopePurpose
Storage Blob Data OwnerFunction App deployment storage accountAccess function artifacts and operational blobs (required because shared key access is disabled).
ReaderEach ACI workload resource groupRead container group state and determine the current private IP.
Private DNS Zone ContributorPrivate DNS forward zone(s)Create, update, and delete A records for ACI hostnames.
Private DNS Zone ContributorPrivate DNS reverse zone(s)Create, update, and clean up PTR records for ACI IPs.
Monitoring Metrics Publisher (optional)Data Collection Rule (DCR)Upload structured IP‑drift events to Log Analytics via the ingestion API.

 

---

 

 

---

 

 

Architecture overview (APP)

Event‑Driven DNS Reconciliation for Azure Container Instances

1. Event contract: what the function receives

Azure Event Grid delivers events using a consistent envelope (Event Grid schema). Each event includes, at a minimum:

  • topic
  • subject
  • id
  • eventType
  • eventTime
  • data
  • dataVersion
  • metadataVersion

In Azure Functions, the Event Grid trigger binding is the recommended way to receive these events directly.

Why the subject field matters

The subject field typically contains the ARM resource ID path of the affected resource.
This solution relies on subject to:

  • verify that the event is for an ACI container group
    (Microsoft.ContainerInstance/containerGroups)
  • extract:
    • subscription ID
    • resource group name
    • container group name

Using subject avoids dependence on publisher‑specific payload fields and keeps parsing fast, deterministic, and resilient.

2. Subscription design: filter hard, process little

The solution follows a strict runbook pattern:

  • subscribe only to ARM lifecycle events
  • filter aggressively so only ACI container groups are included
  • trigger reconciliation only on meaningful state transitions

Recommended Event Grid event types

  • Microsoft.Resources.ResourceWriteSuccess
    (create / update / stop state changes)
  • Microsoft.Resources.ResourceDeleteSuccess
    (container group deletion)
  • Microsoft.Resources.ResourceActionSuccess (optional)
    (restart / start / stop actions, environment‑dependent)

This keeps the Function App simple, predictable, and low‑noise.

3. Application design: two functions, one contract

The application is intentionally split into authoritative mutation and read‑only validation.

Component A — DNS Reconciler (authoritative writer)

A thin Python v2 model wrapper:

  • receives the Event Grid event
  • validates this is an ACI container group event
  • parses identifiers from the ARM subject
  • resolves DNS configuration from a JSON mapping (environment variable)
  • delegates DNS mutation to a deterministic worker script

DNS changes are not implemented inline in Python. Instead, the function:

  • constructs a controlled set of environment variables
  • invokes a worker script (/bin/bash) via subprocess
  • streams stdout/stderr into function logs
  • treats non‑zero exit codes as hard failures

This thin wrapper + deterministic worker pattern isolates DNS correctness logic while keeping the event handler stable and testable.

Component B — IP Drift Tracker (stateless observer)

The drift tracker is a read‑only, stateless validator designed for correctness monitoring.

It:

  • parses identifiers from the event subject
  • exits early on delete events (nothing to validate)
  • reads the live ACI private IP using the Azure SDK
  • reads the current DNS A record baseline
  • compares live vs DNS state and emits drift telemetry

Core comparison logic

  • No DNS record exists → emit first_seen
  • DNS record matches live IP → emit no_change
  • DNS record differs from live IP → emit drift_detected (old/new IP)

Optionally, drift events can be shipped to Log Analytics using DCR‑based ingestion.

4. DNS Reconciler: execution flow

Step 1 — Early filtering

Reject any event whose subject does not contain: Microsoft.ContainerInstance/containerGroups.

This avoids unnecessary processing and ensures strict contract enforcement.

Step 2 — ARM subject parsing

The function splits the subject path and extracts:

  • resource group
  • container group name

This approach is fast, robust, and avoids publisher‑specific schema dependencies.

Step 3 — Zone configuration resolution

DNS configuration is resolved from a JSON map stored in an environment variable.

If no matching configuration exists for the resource group:

  • the function logs the condition
  • exits without error

Why this matters
This keeps the solution multi‑environment without duplicating deployments.
Only configuration changes — not code — are required.

Step 4 — Delegation to worker logic

The function constructs a deterministic runtime context and invokes the worker:

  • forward zone name
  • reverse zone name(s)
  • container group name
  • current private IP
  • TTL and execution flags

The worker performs reconciliation and exits with explicit success or failure.

5. What “reconciliation” actually means

Reconciliation follows clear, idempotent semantics.

Create / Update events

  • Upsert A record
    • if record exists and matches current IP → no‑op
    • else → create or overwrite with new IP
  • Upsert PTR record
    • compute PTR name using IP octets and reverse zone alignment
    • create or overwrite PTR to hostname.<forward-zone>

Delete events

  • delete the A record for the hostname
  • scan PTR record sets:
    • remove targets matching the hostname
    • delete record set if empty

All operations are safe to repeat.

6. Why IP drift tracking is separate

DNS reconciliation enforces correctness at event time, but drift can still occur due to:

  • manual DNS edits
  • partial failures
  • delete / recreate race conditions
  • unexpected redeployments or restarts

The drift tracker exists as a continuous correctness validator, not as a repair mechanism.

This separation keeps responsibilities clear:

  • Reconciler → fixes state
  • Drift tracker → observes and reports state

7. Observability: correctness vs runtime health

There is an important distinction:

Runtime health

  • container crashes
  • image pull failures
  • restarts
  • platform events
    (visible in standard ACI / Container logs)

DNS correctness

  • A record != live IP
  • missing PTR records
  • stale reverse mappings

The IP Drift Tracker provides this correctness layer, which complements — not replaces — runtime monitoring.

8. Engineering constraints that shape the design

At‑least‑once delivery → idempotency

Event Grid delivery must be treated as at‑least‑once.
Every reconciliation action is safe to execute multiple times.

Explicit failure behavior

If the worker script returns a non‑zero exit code:

  • the function invocation fails
  • the failure is visible and alertable
  • incorrect DNS does not silently persist

 

No RepliesBe the first to reply