The invisible attack surface: hunting AI threats in Defender XDR

Raae_

Microsoft

Nov 11, 2025

Co-author: Alexander Chavarria

As organizations embed AI across their business, the same technology that drives productivity also introduces a new class of risk: prompts that can be manipulated, data that can be leaked, and AI systems that can be tricked into doing things they shouldn’t. Attackers are already testing these boundaries, and defenders need visibility into how AI is being used - not just where it’s deployed.

Microsoft Defender for Cloud now brings that visibility into the hunt. Its AI threat protection detects prompt injection, sensitive data exposure, and misuse of credentials in real time, correlating those signals with endpoint, identity, and cloud telemetry through Microsoft Defender XDR. The result is a single, searchable surface for investigating how both people and AI-driven systems behave under pressure.

As of 2025, Defender for AI is fully integrated into Microsoft Defender for Cloud, extending protection to AI models, prompts, and datasets across Azure AI workloads. This makes Defender for Cloud the central platform for securing enterprise AI environments. Meanwhile, Microsoft Defender Experts continues expanding across Defender XDR, offering 24/7 human-led monitoring and investigation, with full active coverage for servers within Defender for Cloud today.

For threat hunters, this evolution isn’t theoretical; it’s tactical. The same curiosity and precision that uncover lateral movement or data exfiltration now apply to AI misuse. In this post, we’ll walk through practical KQL hunts to surface suspicious AI activity, from abnormal model usage patterns to subtle signs of data exfiltration that traditional detections might miss.

The AI attack surface: old playbook, new players

Attackers aren’t reinventing the wheel; they’re repurposing it.
The top risks map neatly to the OWASP Top 10 for LLM Applications:

Prompt injection (LLM01) – Manipulating model logic through crafted inputs or poisoned context
Sensitive data disclosure (LLM06) – AI returning confidential data due to mis-scoped access
Shadow AI usage – Employees using external copilots with corporate data
Wallet abuse – API tokens or service principals driving massive, unintended consumption

It’s not about new telemetry; correlation is what matters. Defender surfaces these risks by tying AI alerts from Defender for Cloud to real user behavior across your XDR environment.

Threat hunting: from AI alerts to insight

Forget slide decks. These are practical, production-ready hunting patterns using real Defender data tables.

1. Shadow AI exfiltration detection

Office apps sending data to external AI endpoints (the #1 exfil path today).

(

    DeviceNetworkEvents

    | where RemoteUrl has_any (dynamic(["openai.com","anthropic.com","claude.ai","cohere.ai","chatgpt.com","gemini.google.com","huggingface.co","perplexity.ai"]))

    | where InitiatingProcessFileName in~ (dynamic(["EXCEL.EXE","WINWORD.EXE","OUTLOOK.EXE","POWERPNT.EXE","ONENOTE.EXE"]))

        or InitiatingProcessFileName in~ (dynamic(["chrome.exe","msedge.exe","firefox.exe","brave.exe"]))

    | extend Device = toupper(split(DeviceName, ".")[0]),

             IsOffice = InitiatingProcessFileName in~ (dynamic(["EXCEL.EXE","WINWORD.EXE","OUTLOOK.EXE","POWERPNT.EXE","ONENOTE.EXE"]))

    | summarize Connections = count(), IsOffice = max(IsOffice), AITime = max(Timestamp)

        by Device, User = InitiatingProcessAccountName

)

| join kind=inner (

    DeviceFileEvents

    | where ActionType in~ ("FileCopied","FileCreated","FileModified","FileRenamed")

    | extend Device = toupper(split(DeviceName, ".")[0]),

             Lower = tolower(strcat(FolderPath, FileName))

    | extend HeuristicFlag = case(

        Lower has_any ("password","credential","secret","api_key") or Lower endswith ".key" or Lower endswith ".pem", "Credential",

        Lower has_any ("confidential","restricted","classified","sensitive"), "Classified",

        Lower has_any ("ssn","salary","payroll"), "PII",

        Lower has_any ("finance","hr","legal","executive"), "OrgSensitive",

        "Other"

      ),

      LabelFlag = case(

        SensitivityLabel has "Highly Confidential", "Classified",

        SensitivityLabel has "Confidential", "Sensitive",

        SensitivityLabel has "Internal", "Internal",

        isnotempty(SensitivityLabel), "Labeled",

        "Unlabeled"

      )

    | where HeuristicFlag != "Other" or LabelFlag in ("Classified","Sensitive","Internal","Labeled")

    | summarize

          Files = count(),

          HeuristicCount = countif(HeuristicFlag != "Other"),

          DLPCount = countif(isnotempty(SensitivityLabel)),

          Types = make_set_if(HeuristicFlag, HeuristicFlag != "Other"),

          Labels = make_set_if(SensitivityLabel, isnotempty(SensitivityLabel)),

          FileTime = max(Timestamp)

        by Device, User = InitiatingProcessAccountName

) on Device, User

| extend Delta = datetime_diff('minute', AITime, FileTime)

| where abs(Delta) <= 240

| extend Priority = case(

    IsOffice == 1, "Critical",

    Labels has_any ("Highly Confidential","Confidential") or Types has "Credential" or Types has "Classified", "High",

    Files >= 20, "High",

    "Medium"

)

| project Priority, Device, User, Connections, Files, HeuristicCount, DLPCount, Types, Labels, Delta

| order by Priority desc, Files desc

Why it works: Correlates outbound AI traffic with sensitive file access.
Action: Block the key, review DLP coverage, fix workflow gaps.

2. Anomalous consumption patterns

Off-hours Azure OpenAI activity isn’t necessarily productivity; it might be unsanctioned automation or exfiltration.

// Azure OpenAI & LLM Off-Hours Detection - PER USER TIMEZONE

// DISCLAIMER: Time zone detection is approximate, based on behavioral inference.

// Validate per user/device when high-risk anomalies are flagged.

// If authoritative time zone data (e.g., Entra sign-in or mailbox settings) is available, prefer that source.

let MinRequestsThreshold = 500;

let MinTokensThreshold = 20000;

let OffHoursStart = 21;

let OffHoursEnd = 5;

let UserTimezones = CloudAppEvents

| where Timestamp > ago(60d)

| where Application has_any ("OpenAI", "Azure OpenAI", "ChatGPT", "Claude", "Gemini", "Anthropic", "Perplexity", "Microsoft 365 Copilot")

| extend HourUTC = datetime_part("Hour", Timestamp)

| summarize ActivityByHour = count() by AccountDisplayName, HourUTC

| summarize arg_max(ActivityByHour, HourUTC) by AccountDisplayName

| extend TimezoneOffset = iff((HourUTC - 14 + 24) % 24 > 12, (HourUTC - 14 + 24) % 24 - 24, (HourUTC - 14 + 24) % 24)

| project AccountDisplayName, TimezoneOffset;

CloudAppEvents

| where Timestamp > ago(30d)

| where Application has_any ("OpenAI", "Azure OpenAI", "ChatGPT", "Claude", "Gemini", "Anthropic", "Perplexity", "Microsoft 365 Copilot")

| extend

    HourUTC = datetime_part("Hour", Timestamp),

    DayUTC = toint(dayofweek(Timestamp)),

    Tokens = toint(RawEventData.totalTokens)

| join kind=leftouter (UserTimezones) on AccountDisplayName

| extend TZ = coalesce(TimezoneOffset, 0)

| extend HourLocal = (HourUTC + TZ + 24) % 24

| extend DayLocal = (DayUTC + iff(HourUTC + TZ >= 24, 1, iff(HourUTC + TZ < 0, -1, 0)) + 7) % 7

| extend IsAnomalous = (DayLocal in (0, 6)) or (HourLocal >= OffHoursStart or HourLocal < OffHoursEnd)

| where IsAnomalous

| extend IsWeekend = DayLocal in (0, 6), IsOffHours = HourLocal >= OffHoursStart or HourLocal < OffHoursEnd

| summarize

    Requests = count(),

    TokensUsed = sum(Tokens),

    WeekendRequests = countif(IsWeekend),

    LateNightRequests = countif(IsOffHours and not(IsWeekend)),

    LocalHours = make_set(HourLocal),

    LocalDays = make_set(DayLocal),

    Applications = make_set(Application),

    ActionTypes = make_set(ActionType),

    FirstSeen = min(Timestamp),

    LastSeen = max(Timestamp),

    DetectedTZ = any(TZ)

    by AccountDisplayName, IPAddress

| where Requests >= MinRequestsThreshold or TokensUsed >= MinTokensThreshold

| extend

    UserTimezone = case(

        DetectedTZ == 0, "UTC/GMT",

        DetectedTZ == -5, "EST (UTC-5)",

        DetectedTZ == -4, "EDT (UTC-4)",

        DetectedTZ == -6, "CST (UTC-6)",

        DetectedTZ == -7, "MST (UTC-7)",

        DetectedTZ == -8, "PST (UTC-8)",

        DetectedTZ == 1, "CET (UTC+1)",

        DetectedTZ == 8, "CST China (UTC+8)",

        DetectedTZ == 9, "JST Japan (UTC+9)",

        DetectedTZ > 0, strcat("UTC+", DetectedTZ),

        strcat("UTC", DetectedTZ)

    )

| extend

    ThreatPattern = case(

        array_length(Applications) > 1, "Multiple LLM Services",

        WeekendRequests > LateNightRequests * 2, "Weekend Automation",

        LateNightRequests > WeekendRequests * 2, "Late-Night Automation",

        Requests > 500, "High-Volume Script",

        "Unusual Off-Hours Activity"

    )

| extend

    RiskScore = case(

        Requests > 1000 and TokensUsed > 100000, 100,

        Requests > 500 and WeekendRequests > 100, 95,

        TokensUsed > 50000 or Requests > 200, 85,

        WeekendRequests > 100, 80,

        Requests > 100 or TokensUsed > 20000, 70,

        60

    )

| extend

    RiskLevel = case(

        RiskScore >= 90, "Critical",

        RiskScore >= 75, "High",

        RiskScore >= 60, "Medium",

        "Low"

    )

| project

    AccountDisplayName,

    IPAddress,

    RiskLevel,

    RiskScore,

    ThreatPattern,

    Requests,

    TokensUsed,

    WeekendRequests,

    LateNightRequests,

    Applications,

    UserTimezone,

    LocalHours,

    LocalDays,

    ActionTypes,

    FirstSeen,

    LastSeen

| sort by RiskScore desc, Requests desc

Why it works: Humans sleep. Scripts don’t. Temporal anomalies expose automation faster than anomaly models.

Action: Check grounding sources, confirm the IP, disable keys or service principals.

3. Bot-like behavior hunt

Highlights automation vs. compromise and early detection.

// ---- Tunables (adjust if needed) ----

let LookbackDays     = 7d;

let MinEvents        = 3;     // ignore trivial users

let RPM_AutoThresh   = 50.0;  // requests/hour threshold that smells like a bot

let MaxIPs_Auto      = 1;     // single IP suggests fixed worker

let MaxApps_Auto     = 1;     // single app suggests fixed worker

let MaxUAs_Auto      = 2;     // very few UAs over lookback

let MaxHighTokPct    = 5.0;   // % of requests over 4k tokens still considered benign



CloudAppEvents

| where Timestamp > ago(LookbackDays)

| where Application has_any ("OpenAI", "Azure OpenAI", "Microsoft 365 Copilot Chat")

| extend User = tolower(AccountDisplayName)

| extend raw = todynamic(RawEventData)

| extend Tokens = toint(coalesce(raw.totalTokens, raw.total_tokens, raw.usage_total_tokens))

| summarize

    TotalRequests     = count(),

    HighTokenRequests = countif(Tokens > 4000),

    AvgTokens         = avg(Tokens),

    MaxTokens         = max(Tokens),

    UniqueIPs         = dcount(IPAddress),

    IPs               = make_set(IPAddress, 50),

    UniqueApps        = dcount(Application),

    Apps              = make_set(Application, 20),

    UniqueUAs         = dcount(UserAgent),

    FirstRequest      = min(Timestamp),

    LastRequest       = max(Timestamp)

  by User

| where TotalRequests >= MinEvents

| extend _dur = toreal(datetime_diff('hour', LastRequest, FirstRequest))

| extend DurationHours = iif(_dur <= 0, 1.0, _dur)

| extend RequestsPerHour = TotalRequests / DurationHours

| extend HighTokenRatio  = (HighTokenRequests * 100.0) / TotalRequests

// ---- Heuristic: derive likely automation (no lists/regex) ----

| extend IsLikelyAutomation =

    (UniqueIPs <= MaxIPs_Auto and

     UniqueApps <= MaxApps_Auto and

     UniqueUAs  <= MaxUAs_Auto and

     RequestsPerHour >= RPM_AutoThresh and

     HighTokenRatio <= MaxHighTokPct)

// ---- Techniques & risk ----

| extend

    IsRapidFire    = RequestsPerHour > 20,

    IsHighVolume   = TotalRequests > 50,

    IsTokenAbuse   = HighTokenRatio > 30,

    IsMultiService = UniqueApps > 1,

    IsMultiIP      = UniqueIPs > 2,

    IsEscalating   = DurationHours < 24 and TotalRequests > 10

| where IsRapidFire or IsHighVolume or IsTokenAbuse or IsMultiService or IsMultiIP or IsEscalating

| extend TechniqueCount = toint(IsRapidFire) + toint(IsHighVolume) + toint(IsTokenAbuse) + toint(IsMultiService) + toint(IsMultiIP) + toint(IsEscalating)

| extend Risk = case(

    IsLikelyAutomation and UniqueIPs == 1 and UniqueApps == 1 and IsTokenAbuse == 0, "Low - Likely Automation",

    TechniqueCount >= 4, "Critical - Multi-Vector Behavior",

    TechniqueCount >= 3, "High - Attack Pattern",

    TechniqueCount >= 2, "Medium - Anomalous Behavior",

    "Low"

)

// Custom sort: Critical > High > Medium > Low - Likely Automation > Low

| extend RiskOrder = case(

    Risk startswith "Critical", 1,

    Risk startswith "High",     2,

    Risk startswith "Medium",   3,

    Risk == "Low - Likely Automation", 4,

    5

)

| project Risk, User, TotalRequests, RequestsPerHour, TechniqueCount, IsLikelyAutomation,

          IsRapidFire, IsHighVolume, IsTokenAbuse, IsMultiIP, IsMultiService, IsEscalating,

          UniqueIPs, IPs, UniqueApps, UniqueUAs, HighTokenRatio, DurationHours,

          FirstRequest, LastRequest, RiskOrder

| sort by RiskOrder asc, TotalRequests desc

Why it works: Hunting automation-like patterns that could indicate either sanctioned scripts or early-stage compromise, enabling proactive detection before alerts fire.

Action: Investigate flagged accounts immediately to confirm intent and mitigate potential AI misuse.

Operational lessons that scale beyond the lab

Custom detections > Ad hoc hunts – Turn query #1 into a scheduled detection. Shadow AI isn’t a one-off behavior.
Security Copilot ≠ search bar – Use it for triage context, not hunting logic.
Set quotas, treat them like controls – Token budgets and rate limits are as critical as firewalls for AI workloads.
Defender for Cloud Apps – Block risky generative AI apps while letting sanctioned copilots run.

Getting started with threat hunting for AI workloads

Before you run these hunts at scale, make sure your environment is instrumented for cognitive visibility. That means insight into how your AI models are being used and what data they reason over, not just how much compute they consume.

Traditional telemetry shows process, network, and authentication events. Cognitive visibility adds prompts, model responses, grounding sources, and token behavior, giving analysts the context that explains why an AI acted the way it did.

Defender for AI Services integrates with Defender for Cloud to provide that visibility layer, but the right configuration turns data collection into situational awareness.

Enable the AI services plan – Make sure Defender for AI Services is enabled at the subscription level. This activates continuous monitoring for Azure OpenAI, AI Foundry, and other managed AI workloads. Microsoft Learn →
Enable user prompt evidence – Turn on prompt capture for Defender for AI alerts. Seeing the exact input and model response during an attack is the difference between speculation and evidence. Microsoft Learn →
Validate your schema – Always test KQL queries in your workspace. Field names and event structures can differ across tenants and tiers, especially in CloudAuditEvents and AlertEvidence.
Use Security Copilot for acceleration – Let Copilot translate natural language hypotheses into KQL, then fine-tune the logic yourself. It is the fastest way to scale your hunts without losing precision. Microsoft Learn →
Monitor both sides of the equation – Hunt for both AI-specific risks such as prompt injection, model abuse, or token sprawl, and traditional threats that target AI systems such as compromised credentials, exposed storage, or lateral movement through service principals.

Visibility is only as strong as the context you capture. The sooner you enable these settings, the sooner your SOC can understand why your models behave the way they do, not just what they did.

Final thoughts: from prompts to protections

As AI becomes part of core infrastructure, its telemetry must become part of your SOC’s muscle memory. The same principles that power endpoint or identity defense (i.e. visibility, correlation, anomaly detection) now apply to model inference, token usage, and data grounding.

Defender for Cloud and Defender XDR give you that continuity: alerts flow where your analysts already work, and your hunting logic evolves without a separate stack.

Protecting AI isn’t about chasing every model. It’s about extending proven security discipline to the systems that now think alongside you.