Co-authors: Avi Sharma & Diwas Sedai - Security Researchers, Microsoft Defender for Cloud
In the not-so distant past, breach timelines were measured in hours or days. Although still the norm, times are changing – quickly.
The one we reconstructed from AKS telemetry took 86 seconds. From the time a malicious package landed inside an inference pod to the second identity credentials left the cluster took less time than reheating cold coffee in a microwave. No CVE. No container escape. Just trust, abused at runtime.
NOTE: When you reach the end of this blog post, please see S.T.A.R. Episode 7 where we’ll slow things down and walk you through everything you need to know.
The signal nobody was watching
Runtime pip installs in Kubernetes pods are noise. They happen thousands of times a day across production ML clusters such as frameworks pulling dependencies on startup, notebooks rebuilding environments, inference servers patching themselves mid-session.
Security teams learned long ago to tune them out.
That normalization is load-bearing infrastructure for a specific attack class. When a compromised package executes inside a running inference container, the install event looks identical to legitimate framework behavior. The difference is what happens 86 seconds later.
The Azure Instance Metadata Service (IMDS) call that follows isn’t anomalous either—AI pods have legitimate business reasons to call IMDS. Managed identity is the whole point. The problem is that by the time a compromised pod reaches for those credentials, the telemetry shows you a routine cloud API call, not an intrusion. It’s a pattern Microsoft Threat Intelligence has tracked directly—workload identity abuse as an attacker path into cloud resources—and the numbers back it up: 51% of workload identities observed in production were completely inactive. It’s an attack surface that’s ‘invisible’ because nobody’s watching it.
Three credentials, one compromised pod
What makes AI inference pods uniquely valuable to attackers isn't just the managed identity, it's the credential density. A single compromised inference pod in a well-configured AKS cluster can surface:
- In-cluster credentials: service account tokens scoped to internal APIs and other workloads
- CI/CD pipeline credentials; artifact registries, model repos, build systems often trusted by the pod at deploy time
- Azure control plane access; via the managed identity provisioned for legitimate Azure API calls
Three credential domains from a single pod. Traditional K8s threat models assume lateral movement requires multiple steps. AI workloads upend that. The Microsoft Kubernetes threat matrix specifically maps compromised cloud credentials as a path to full cluster takeover, and the same logic applies in reverse when the pod itself is the initial foothold.
The supply chain angle most teams miss
The delivery mechanism in reconstructed incidents isn't a zero-day. It's your CI/CD pipeline working exactly as designed. A compromised CI identity. A model artifact modified in transit and pushed with the same tag, signed with a legitimate credential. The deployment pipeline—correctly—promotes it through staging. Code executes. The clock starts.
The AI framework layer is equally exposed. A recent case study on the LangGrinch vulnerability (CVE-2025-68664) in LangChain Core is a good reference point here: a flaw in how AI orchestration frameworks deserialize state let attacker-controlled input get reconstructed as a trusted object. The exploit path didn't require a container escape or a misconfiguration—just a trusted framework doing what it was designed to do.
This is what supply chain attacks against AI workloads look like at the telemetry layer: automation doing its job, against you. Detection needs to happen before the artifact lands, or the runtime install is already background noise by the time an analyst looks.
What the data shows
Across production AKS telemetry, the patterns are consistent:
- GPU nodes running privileged by architectural necessity—not misconfiguration—mean container compromise frequently equals node access without any escape technique.
- Model files (.pt, .pkl, safetensors) are loaded without scanning in almost every production pipeline we've analyzed.
- IMDS calls from application containers are present in legitimate workloads and attacker-controlled workloads alike—the endpoint and headers are indistinguishable.
The model file risk is underappreciated. Microsoft's AI security research has long flagged Python serialization threats in ML model files—and the broader Azure AI Foundry model security work specifically calls out supply chain backdoors and arbitrary code execution as things it actively scans for. The gap is that these controls exist at the platform layer. Most organizations deploying custom models or fine-tuned artifacts to AKS aren't inheriting those protections automatically.
The gap isn't in detection capability. It's in what we've decided counts as signal. AI workloads normalized behaviors that are red flags in every other context, and detection stacks largely followed suit.
See the full kill-chain—recorded on March 9
I presented this research live with Diwas Sedai at the Microsoft Defender Experts S.T.A.R. Forum Episode 7 on March 9 at 10AM ET.
We walked through:
- The full kill-chain reconstruction with real AKS telemetry
- KQL hunting queries you can run in your environment that same day
- The detection gaps and how to close them
Please see the full story – S.T.A.R. Forum Episode 7 - The Runtime Reality Check: From Poisoned Packages to AI Workloads as Adversaries
Please see the previous S.T.A.R. Forum Episodes