Co-author: Alexander Chavarria
As organizations embed AI across their business, the same technology that drives productivity also introduces a new class of risk: prompts that can be manipulated, data that can be leaked, and AI systems that can be tricked into doing things they shouldn’t. Attackers are already testing these boundaries, and defenders need visibility into how AI is being used - not just where it’s deployed.
Microsoft Defender for Cloud now brings that visibility into the hunt. Its AI threat protection detects prompt injection, sensitive data exposure, and misuse of credentials in real time, correlating those signals with endpoint, identity, and cloud telemetry through Microsoft Defender XDR. The result is a single, searchable surface for investigating how both people and AI-driven systems behave under pressure.
As of 2025, Defender for AI is fully integrated into Microsoft Defender for Cloud, extending protection to AI models, prompts, and datasets across Azure AI workloads. This makes Defender for Cloud the central platform for securing enterprise AI environments. Meanwhile, Microsoft Defender Experts continues expanding across Defender XDR, offering 24/7 human-led monitoring and investigation, with full active coverage for servers within Defender for Cloud today.
For threat hunters, this evolution isn’t theoretical; it’s tactical. The same curiosity and precision that uncover lateral movement or data exfiltration now apply to AI misuse. In this post, we’ll walk through practical KQL hunts to surface suspicious AI activity, from abnormal model usage patterns to subtle signs of data exfiltration that traditional detections might miss.
The AI attack surface: old playbook, new players
Attackers aren’t reinventing the wheel; they’re repurposing it.
The top risks map neatly to the OWASP Top 10 for LLM Applications:
- Prompt injection (LLM01) – Manipulating model logic through crafted inputs or poisoned context
- Sensitive data disclosure (LLM06) – AI returning confidential data due to mis-scoped access
- Shadow AI usage – Employees using external copilots with corporate data
- Wallet abuse – API tokens or service principals driving massive, unintended consumption
It’s not about new telemetry; correlation is what matters. Defender surfaces these risks by tying AI alerts from Defender for Cloud to real user behavior across your XDR environment.
Threat hunting: from AI alerts to insight
Forget slide decks. These are practical, production-ready hunting patterns using real Defender data tables.
1. Shadow AI exfiltration detection
Office apps sending data to external AI endpoints (the #1 exfil path today).
(
DeviceNetworkEvents
| where RemoteUrl has_any (dynamic(["openai.com","anthropic.com","claude.ai","cohere.ai","chatgpt.com","gemini.google.com","huggingface.co","perplexity.ai"]))
| where InitiatingProcessFileName in~ (dynamic(["EXCEL.EXE","WINWORD.EXE","OUTLOOK.EXE","POWERPNT.EXE","ONENOTE.EXE"]))
or InitiatingProcessFileName in~ (dynamic(["chrome.exe","msedge.exe","firefox.exe","brave.exe"]))
| extend Device = toupper(split(DeviceName, ".")[0]),
IsOffice = InitiatingProcessFileName in~ (dynamic(["EXCEL.EXE","WINWORD.EXE","OUTLOOK.EXE","POWERPNT.EXE","ONENOTE.EXE"]))
| summarize Connections = count(), IsOffice = max(IsOffice), AITime = max(Timestamp)
by Device, User = InitiatingProcessAccountName
)
| join kind=inner (
DeviceFileEvents
| where ActionType in~ ("FileCopied","FileCreated","FileModified","FileRenamed")
| extend Device = toupper(split(DeviceName, ".")[0]),
Lower = tolower(strcat(FolderPath, FileName))
| extend HeuristicFlag = case(
Lower has_any ("password","credential","secret","api_key") or Lower endswith ".key" or Lower endswith ".pem", "Credential",
Lower has_any ("confidential","restricted","classified","sensitive"), "Classified",
Lower has_any ("ssn","salary","payroll"), "PII",
Lower has_any ("finance","hr","legal","executive"), "OrgSensitive",
"Other"
),
LabelFlag = case(
SensitivityLabel has "Highly Confidential", "Classified",
SensitivityLabel has "Confidential", "Sensitive",
SensitivityLabel has "Internal", "Internal",
isnotempty(SensitivityLabel), "Labeled",
"Unlabeled"
)
| where HeuristicFlag != "Other" or LabelFlag in ("Classified","Sensitive","Internal","Labeled")
| summarize
Files = count(),
HeuristicCount = countif(HeuristicFlag != "Other"),
DLPCount = countif(isnotempty(SensitivityLabel)),
Types = make_set_if(HeuristicFlag, HeuristicFlag != "Other"),
Labels = make_set_if(SensitivityLabel, isnotempty(SensitivityLabel)),
FileTime = max(Timestamp)
by Device, User = InitiatingProcessAccountName
) on Device, User
| extend Delta = datetime_diff('minute', AITime, FileTime)
| where abs(Delta) <= 240
| extend Priority = case(
IsOffice == 1, "Critical",
Labels has_any ("Highly Confidential","Confidential") or Types has "Credential" or Types has "Classified", "High",
Files >= 20, "High",
"Medium"
)
| project Priority, Device, User, Connections, Files, HeuristicCount, DLPCount, Types, Labels, Delta
| order by Priority desc, Files desc
Why it works: Correlates outbound AI traffic with sensitive file access.
Action: Block the key, review DLP coverage, fix workflow gaps.
2. Anomalous consumption patterns
Off-hours Azure OpenAI activity isn’t necessarily productivity; it might be unsanctioned automation or exfiltration.
// Azure OpenAI & LLM Off-Hours Detection - PER USER TIMEZONE
// DISCLAIMER: Time zone detection is approximate, based on behavioral inference.
// Validate per user/device when high-risk anomalies are flagged.
// If authoritative time zone data (e.g., Entra sign-in or mailbox settings) is available, prefer that source.
let MinRequestsThreshold = 500;
let MinTokensThreshold = 20000;
let OffHoursStart = 21;
let OffHoursEnd = 5;
let UserTimezones = CloudAppEvents
| where Timestamp > ago(60d)
| where Application has_any ("OpenAI", "Azure OpenAI", "ChatGPT", "Claude", "Gemini", "Anthropic", "Perplexity", "Microsoft 365 Copilot")
| extend HourUTC = datetime_part("Hour", Timestamp)
| summarize ActivityByHour = count() by AccountDisplayName, HourUTC
| summarize arg_max(ActivityByHour, HourUTC) by AccountDisplayName
| extend TimezoneOffset = iff((HourUTC - 14 + 24) % 24 > 12, (HourUTC - 14 + 24) % 24 - 24, (HourUTC - 14 + 24) % 24)
| project AccountDisplayName, TimezoneOffset;
CloudAppEvents
| where Timestamp > ago(30d)
| where Application has_any ("OpenAI", "Azure OpenAI", "ChatGPT", "Claude", "Gemini", "Anthropic", "Perplexity", "Microsoft 365 Copilot")
| extend
HourUTC = datetime_part("Hour", Timestamp),
DayUTC = toint(dayofweek(Timestamp)),
Tokens = toint(RawEventData.totalTokens)
| join kind=leftouter (UserTimezones) on AccountDisplayName
| extend TZ = coalesce(TimezoneOffset, 0)
| extend HourLocal = (HourUTC + TZ + 24) % 24
| extend DayLocal = (DayUTC + iff(HourUTC + TZ >= 24, 1, iff(HourUTC + TZ < 0, -1, 0)) + 7) % 7
| extend IsAnomalous = (DayLocal in (0, 6)) or (HourLocal >= OffHoursStart or HourLocal < OffHoursEnd)
| where IsAnomalous
| extend IsWeekend = DayLocal in (0, 6), IsOffHours = HourLocal >= OffHoursStart or HourLocal < OffHoursEnd
| summarize
Requests = count(),
TokensUsed = sum(Tokens),
WeekendRequests = countif(IsWeekend),
LateNightRequests = countif(IsOffHours and not(IsWeekend)),
LocalHours = make_set(HourLocal),
LocalDays = make_set(DayLocal),
Applications = make_set(Application),
ActionTypes = make_set(ActionType),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp),
DetectedTZ = any(TZ)
by AccountDisplayName, IPAddress
| where Requests >= MinRequestsThreshold or TokensUsed >= MinTokensThreshold
| extend
UserTimezone = case(
DetectedTZ == 0, "UTC/GMT",
DetectedTZ == -5, "EST (UTC-5)",
DetectedTZ == -4, "EDT (UTC-4)",
DetectedTZ == -6, "CST (UTC-6)",
DetectedTZ == -7, "MST (UTC-7)",
DetectedTZ == -8, "PST (UTC-8)",
DetectedTZ == 1, "CET (UTC+1)",
DetectedTZ == 8, "CST China (UTC+8)",
DetectedTZ == 9, "JST Japan (UTC+9)",
DetectedTZ > 0, strcat("UTC+", DetectedTZ),
strcat("UTC", DetectedTZ)
)
| extend
ThreatPattern = case(
array_length(Applications) > 1, "Multiple LLM Services",
WeekendRequests > LateNightRequests * 2, "Weekend Automation",
LateNightRequests > WeekendRequests * 2, "Late-Night Automation",
Requests > 500, "High-Volume Script",
"Unusual Off-Hours Activity"
)
| extend
RiskScore = case(
Requests > 1000 and TokensUsed > 100000, 100,
Requests > 500 and WeekendRequests > 100, 95,
TokensUsed > 50000 or Requests > 200, 85,
WeekendRequests > 100, 80,
Requests > 100 or TokensUsed > 20000, 70,
60
)
| extend
RiskLevel = case(
RiskScore >= 90, "Critical",
RiskScore >= 75, "High",
RiskScore >= 60, "Medium",
"Low"
)
| project
AccountDisplayName,
IPAddress,
RiskLevel,
RiskScore,
ThreatPattern,
Requests,
TokensUsed,
WeekendRequests,
LateNightRequests,
Applications,
UserTimezone,
LocalHours,
LocalDays,
ActionTypes,
FirstSeen,
LastSeen
| sort by RiskScore desc, Requests desc
Why it works: Humans sleep. Scripts don’t. Temporal anomalies expose automation faster than anomaly models.
Action: Check grounding sources, confirm the IP, disable keys or service principals.
3. Bot-like behavior hunt
Highlights automation vs. compromise and early detection.
// ---- Tunables (adjust if needed) ----
let LookbackDays = 7d;
let MinEvents = 3; // ignore trivial users
let RPM_AutoThresh = 50.0; // requests/hour threshold that smells like a bot
let MaxIPs_Auto = 1; // single IP suggests fixed worker
let MaxApps_Auto = 1; // single app suggests fixed worker
let MaxUAs_Auto = 2; // very few UAs over lookback
let MaxHighTokPct = 5.0; // % of requests over 4k tokens still considered benign
CloudAppEvents
| where Timestamp > ago(LookbackDays)
| where Application has_any ("OpenAI", "Azure OpenAI", "Microsoft 365 Copilot Chat")
| extend User = tolower(AccountDisplayName)
| extend raw = todynamic(RawEventData)
| extend Tokens = toint(coalesce(raw.totalTokens, raw.total_tokens, raw.usage_total_tokens))
| summarize
TotalRequests = count(),
HighTokenRequests = countif(Tokens > 4000),
AvgTokens = avg(Tokens),
MaxTokens = max(Tokens),
UniqueIPs = dcount(IPAddress),
IPs = make_set(IPAddress, 50),
UniqueApps = dcount(Application),
Apps = make_set(Application, 20),
UniqueUAs = dcount(UserAgent),
FirstRequest = min(Timestamp),
LastRequest = max(Timestamp)
by User
| where TotalRequests >= MinEvents
| extend _dur = toreal(datetime_diff('hour', LastRequest, FirstRequest))
| extend DurationHours = iif(_dur <= 0, 1.0, _dur)
| extend RequestsPerHour = TotalRequests / DurationHours
| extend HighTokenRatio = (HighTokenRequests * 100.0) / TotalRequests
// ---- Heuristic: derive likely automation (no lists/regex) ----
| extend IsLikelyAutomation =
(UniqueIPs <= MaxIPs_Auto and
UniqueApps <= MaxApps_Auto and
UniqueUAs <= MaxUAs_Auto and
RequestsPerHour >= RPM_AutoThresh and
HighTokenRatio <= MaxHighTokPct)
// ---- Techniques & risk ----
| extend
IsRapidFire = RequestsPerHour > 20,
IsHighVolume = TotalRequests > 50,
IsTokenAbuse = HighTokenRatio > 30,
IsMultiService = UniqueApps > 1,
IsMultiIP = UniqueIPs > 2,
IsEscalating = DurationHours < 24 and TotalRequests > 10
| where IsRapidFire or IsHighVolume or IsTokenAbuse or IsMultiService or IsMultiIP or IsEscalating
| extend TechniqueCount = toint(IsRapidFire) + toint(IsHighVolume) + toint(IsTokenAbuse) + toint(IsMultiService) + toint(IsMultiIP) + toint(IsEscalating)
| extend Risk = case(
IsLikelyAutomation and UniqueIPs == 1 and UniqueApps == 1 and IsTokenAbuse == 0, "Low - Likely Automation",
TechniqueCount >= 4, "Critical - Multi-Vector Behavior",
TechniqueCount >= 3, "High - Attack Pattern",
TechniqueCount >= 2, "Medium - Anomalous Behavior",
"Low"
)
// Custom sort: Critical > High > Medium > Low - Likely Automation > Low
| extend RiskOrder = case(
Risk startswith "Critical", 1,
Risk startswith "High", 2,
Risk startswith "Medium", 3,
Risk == "Low - Likely Automation", 4,
5
)
| project Risk, User, TotalRequests, RequestsPerHour, TechniqueCount, IsLikelyAutomation,
IsRapidFire, IsHighVolume, IsTokenAbuse, IsMultiIP, IsMultiService, IsEscalating,
UniqueIPs, IPs, UniqueApps, UniqueUAs, HighTokenRatio, DurationHours,
FirstRequest, LastRequest, RiskOrder
| sort by RiskOrder asc, TotalRequests desc
Why it works: Hunting automation-like patterns that could indicate either sanctioned scripts or early-stage compromise, enabling proactive detection before alerts fire.
Action: Investigate flagged accounts immediately to confirm intent and mitigate potential AI misuse.
Operational lessons that scale beyond the lab
- Custom detections > Ad hoc hunts – Turn query #1 into a scheduled detection. Shadow AI isn’t a one-off behavior.
- Security Copilot ≠ search bar – Use it for triage context, not hunting logic.
- Set quotas, treat them like controls – Token budgets and rate limits are as critical as firewalls for AI workloads.
- Defender for Cloud Apps – Block risky generative AI apps while letting sanctioned copilots run.
Getting started with threat hunting for AI workloads
Before you run these hunts at scale, make sure your environment is instrumented for cognitive visibility. That means insight into how your AI models are being used and what data they reason over, not just how much compute they consume.
Traditional telemetry shows process, network, and authentication events. Cognitive visibility adds prompts, model responses, grounding sources, and token behavior, giving analysts the context that explains why an AI acted the way it did.
Defender for AI Services integrates with Defender for Cloud to provide that visibility layer, but the right configuration turns data collection into situational awareness.
- Enable the AI services plan – Make sure Defender for AI Services is enabled at the subscription level. This activates continuous monitoring for Azure OpenAI, AI Foundry, and other managed AI workloads. Microsoft Learn →
- Enable user prompt evidence – Turn on prompt capture for Defender for AI alerts. Seeing the exact input and model response during an attack is the difference between speculation and evidence. Microsoft Learn →
- Validate your schema – Always test KQL queries in your workspace. Field names and event structures can differ across tenants and tiers, especially in CloudAuditEvents and AlertEvidence.
- Use Security Copilot for acceleration – Let Copilot translate natural language hypotheses into KQL, then fine-tune the logic yourself. It is the fastest way to scale your hunts without losing precision. Microsoft Learn →
- Monitor both sides of the equation – Hunt for both AI-specific risks such as prompt injection, model abuse, or token sprawl, and traditional threats that target AI systems such as compromised credentials, exposed storage, or lateral movement through service principals.
Visibility is only as strong as the context you capture. The sooner you enable these settings, the sooner your SOC can understand why your models behave the way they do, not just what they did.
Final thoughts: from prompts to protections
As AI becomes part of core infrastructure, its telemetry must become part of your SOC’s muscle memory. The same principles that power endpoint or identity defense (i.e. visibility, correlation, anomaly detection) now apply to model inference, token usage, and data grounding.
Defender for Cloud and Defender XDR give you that continuity: alerts flow where your analysts already work, and your hunting logic evolves without a separate stack.
Protecting AI isn’t about chasing every model. It’s about extending proven security discipline to the systems that now think alongside you.
Further Reading
- Defender for Cloud AI Threat Protection
- Advanced Hunting in Microsoft Defender XDR
- OWASP Top 10 for LLM Applications
Found a better pattern? Post it. The threat surface is new, but the hunt discipline isn’t.