In Part 1, we identified the security blind spots in GenAI workloads. In Part 2, we built security into your code with structured logging, user context tracking, and defensive programming patterns. Your AKS-hosted application is now emitting rich, security-relevant JSON logs flowing through Container Insights to Azure Log Analytics. Now it's time to turn those logs into actionable security intelligence with Microsoft Sentinel. This post focuses on what you should detect and why it matters—not the mechanics of setting up Sentinel. We'll cover the analytics rules that catch GenAI-specific threats, the correlation patterns that detect sophisticated attacks, and the workbooks that give your SOC visibility into your GenAI security posture.
Why Sentinel for GenAI Security Observability?
Before diving into detection rules, let's address why Microsoft Sentinel is uniquely positioned for GenAI security operations—especially compared to traditional or non-native SIEMs.
Native Azure Integration: Zero ETL Overhead
The problem with external SIEMs: To monitor your GenAI workloads with a third-party SIEM, you need to:
- Configure log forwarding from Log Analytics to external systems
- Set up data connectors or agents for Azure OpenAI audit logs
- Create custom parsers for Azure-specific log schemas
- Maintain authentication and network connectivity between Azure and your SIEM
- Pay data egress costs for logs leaving Azure
The Sentinel advantage: Your logs are already in Azure. Sentinel connects directly to:
- Log Analytics workspace - Where your Container Insights logs already flow
- Azure OpenAI audit logs - Native access without configuration
- Azure AD sign-in logs - Instant correlation with identity events
- Defender for Cloud alerts - Platform-level AI threat detection included
- Threat intelligence feeds - Microsoft's global threat data built-in
- Microsoft Defender XDR - AI-driven cybersecurity that unifies threat detection and response across endpoints, email, identities, cloud apps and Sentinel
There's no data movement, no ETL pipelines, and no latency from log shipping. Your GenAI security data is queryable in real-time.
KQL: Built for Complex Correlation at Scale
Why this matters for GenAI: Detecting sophisticated AI attacks requires correlating:
- Application logs (your code from Part 2)
- Azure OpenAI service logs (API calls, token usage, throttling)
- Identity signals (who authenticated, from where)
- Threat intelligence (known malicious IPs)
- Defender for Cloud alerts (platform-level anomalies)
KQL's advantage: Kusto Query Language is designed for this. You can:
- Join across multiple data sources in a single query
- Parse nested JSON (like your structured logs) natively
- Use time-series analysis functions for anomaly detection and behavior patterns
- Aggregate millions of events in seconds
- Extract entities (users, IPs, sessions) automatically for investigation graphs
Example: Correlating your app logs with Azure AD sign-ins and Defender alerts takes 10 lines of KQL. In a traditional SIEM, this might require custom scripts, data normalization, and significantly slower performance.
User Security Context Flows Natively
Remember the user_security_context you pass in extra_body from Part 2? That context:
- Automatically appears in Azure OpenAI's audit logs
- Flows into Defender for Cloud AI alerts
- Is queryable in Sentinel without custom parsing
- Maps to the same identity schema as Azure AD logs
With external SIEMs: You'd need to:
- Extract user context from your application logs
- Separately ingest Azure OpenAI logs
- Write correlation logic to match them
- Maintain entity resolution across different data sources
With Sentinel: It just works. The end_user_id, source_ip, and application_name are already normalized across Azure services.
Built-In AI Threat Detection
Sentinel includes pre-built detections for cloud and AI workloads:
- Azure OpenAI anomalous access patterns (out of the box)
- Unusual token consumption (built-in analytics templates)
- Geographic anomalies (using Azure's global IP intelligence)
- Impossible travel detection (cross-referencing sign-ins with AI API calls)
- Microsoft Defender XDR (correlation with endpoint, email, cloud app signals)
These aren't generic "high volume" alerts—they're tuned for Azure AI services by Microsoft's security research team. You can use them as-is or customize them with your application-specific context.
Entity Behavior Analytics (UEBA)
Sentinel's UEBA automatically builds baselines for:
- Normal request volumes per user
- Typical request patterns per application
- Expected geographic access locations
- Standard model usage patterns
Then it surfaces anomalies:
- "User_12345 normally makes 10 requests/day, suddenly made 500 in an hour"
- "Application_A typically uses GPT-3.5, suddenly switched to GPT-4 exclusively"
- "User authenticated from Seattle, made AI requests from Moscow 10 minutes later"
This behavior modeling happens automatically—no custom ML model training required. Traditional SIEMs would require you to build this logic yourself.
The Bottom Line
For GenAI security on Azure:
- Sentinel reduces time-to-detection because data is already there
- Correlation is simpler because everything speaks the same language
- Investigation is faster because entities are automatically linked
- Cost is lower because you're not paying data egress fees
- Maintenance is minimal because connectors are native
If your GenAI workloads are on Azure, using anything other than Sentinel means fighting against the platform instead of leveraging it.
From Logs to Intelligence: The Complete Picture
Your structured logs from Part 2 are flowing into Log Analytics. Here's what they look like:
{
"timestamp": "2025-10-21T14:32:17.234Z",
"level": "INFO",
"message": "LLM Request Received",
"request_id": "a7c3e9f1-4b2d-4a8e-9c1f-3e5d7a9b2c4f",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"prompt_hash": "d3b07384d113edec49eaa6238ad5ff00",
"security_check_passed": "PASS",
"source_ip": "203.0.113.42",
"end_user_id": "user_550e8400",
"application_name": "AOAI-Customer-Support-Bot",
"model_deployment": "gpt-4-turbo"
}
These logs are in the ContainerLogv2 table since our application “AOAI-Customer-Support-Bot” is running on Azure Kubernetes Services (AKS).
Steps to Setup AKS to stream logs to Sentinel/Log Analytics
- From Azure portal, navigate to your AKS, then to Monitoring -> Insights
- Select Monitor Settings
- Under Container Logs
- Select the Sentinel-enabled Log Analytics workspace
- Select Logs and events
- Check the ‘Enable ContainerLogV2’ and ‘Enable Syslog collection’ options
More details can be found at this link Kubernetes monitoring in Azure Monitor - Azure Monitor | Microsoft Learn
Critical Analytics Rules: What to Detect and Why
Rule 1: Prompt Injection Attack Detection
Why it matters: Prompt injection is the GenAI equivalent of SQL injection. Attackers try to manipulate the model by overriding system instructions. Multiple attempts indicate intentional malicious behavior.
What to detect: 3+ prompt injection attempts within 10 minutes from similar IP
let timeframe = 1d;
let threshold = 3;
AlertEvidence
| where TimeGenerated >= ago(timeframe) and EntityType == "Ip"
| where DetectionSource == "Microsoft Defender for AI Services"
| where Title contains "jailbreak" or Title contains "prompt injection"
| summarize count() by bin (TimeGenerated, 1d), RemoteIP
| where count_ >= threshold
What the SOC sees:
- User identity attempting injection
- Source IP and geographic location
- Sample prompts for investigation
- Frequency indicating automation vs. manual attempts
Severity: High (these are actual attempts to bypass security)
Rule 2: Content Safety Filter Violations
Why it matters: When Azure AI Content Safety blocks a request, it means harmful content (violence, hate speech, etc.) was detected. Multiple violations indicate intentional abuse or a compromised account.
What to detect: Users with 3+ content safety violations in a 1 hour block during a 24 hour time period.
let timeframe = 1d;
let threshold = 3;
ContainerLogV2
| where TimeGenerated >= ago(timeframe)
| where isnotempty(LogMessage.end_user_id)
| where LogMessage.security_check_passed == "FAIL"
| extend source_ip=tostring(LogMessage.source_ip)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend session_id=tostring(LogMessage.session_id)
| extend application_name = tostring(LogMessage.application_name)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| summarize count() by bin(TimeGenerated, 1h),source_ip,end_user_id,session_id,Computer,application_name,security_check_passed
| where count_ >= threshold
What the SOC sees:
- Severity based on violation count
- Time span showing if it's persistent vs. isolated
- Prompt samples (first 80 chars) for context
- Session ID for conversation history review
Severity: High (these are actual harmful content attempts)
Rule 3: Rate Limit Abuse
Why it matters: Persistent rate limit violations indicate automated attacks, credential stuffing, or attempts to overwhelm the system. Legitimate users who hit rate limits don't retry 10+ times in minutes.
What to detect: Users blocked by rate limiter 5+ times in 10 minutes
let timeframe = 1h;
let threshold = 5;
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where OperationName == "Completions" or OperationName contains "ChatCompletions"
| extend tokensUsed = todouble(parse_json(properties_s).usage.total_tokens)
| summarize totalTokens = sum(tokensUsed), requests = count(), rateLimitErrors = countif(httpstatuscode_s == "429") by bin(TimeGenerated, 1h)
| where count_ >= threshold
What the SOC sees:
- Whether it's a bot (immediate retries) or human (gradual retries)
- Duration of attack
- Which application is targeted
- Correlation with other security events from same user/IP
Severity: Medium (nuisance attack, possible reconnaissance)
Rule 4: Anomalous Source IP for User
Why it matters: A user suddenly accessing from a new country or VPN could indicate account compromise. This is especially critical for privileged accounts or after-hours access.
What to detect: User accessing from an IP never seen in the last 7 days
let lookback = 7d;
let recent = 1h;
let baseline =
IdentityLogonEvents
| where Timestamp between (ago(lookback + recent) .. ago(recent))
| where isnotempty(IPAddress)
| summarize knownIPs = make_set(IPAddress) by AccountUpn;
ContainerLogV2
| where TimeGenerated >= ago(recent)
| where isnotempty(LogMessage.source_ip)
| extend source_ip=tostring(LogMessage.source_ip)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend session_id=tostring(LogMessage.session_id)
| extend application_name = tostring(LogMessage.application_name)
| extend security_check_passed = tostring (LogMessage.security_check_passed) | extend full_prompt_sample = tostring (LogMessage.full_prompt_sample)
| lookup baseline on $left.AccountUpn == $right.end_user_id
| where isnull(knownIPs) or IPAddress !in (knownIPs)
| project TimeGenerated, source_ip, end_user_id, session_id, Computer, application_name, security_check_passed, full_prompt_sample
What the SOC sees:
- User identity and new IP address
- Geographic location change
- Whether suspicious prompts accompanied the new IP
- Timing (after-hours access is higher risk)
Severity: Medium (environment compromise, reconnaissance)
Rule 5: Coordinated Attack - Same Prompt from Multiple Users
Why it matters: When 5+ users send identical prompts, it indicates a bot network, credential stuffing, or organized attack campaign. This is not normal user behavior.
What to detect: Same prompt hash from 5+ different users within 1 hour
let timeframe = 1h;
let threshold = 5;
ContainerLogV2
| where TimeGenerated >= ago(timeframe)
| where isnotempty(LogMessage.prompt_hash)
| where isnotempty(LogMessage.end_user_id)
| extend source_ip=tostring(LogMessage.source_ip)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend prompt_hash=tostring(LogMessage.prompt_hash)
| extend application_name = tostring(LogMessage.application_name)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| project TimeGenerated, prompt_hash, source_ip, end_user_id, application_name, security_check_passed
| summarize
DistinctUsers = dcount(end_user_id),
Attempts = count(),
Users = make_set(end_user_id, 100),
IpAddress = make_set(source_ip, 100)
by prompt_hash, bin(TimeGenerated, 1h)
| where DistinctUsers >= threshold
What the SOC sees:
- Attack pattern (single attacker with stolen accounts vs. botnet)
- List of compromised user accounts
- Source IPs for blocking
- Prompt sample to understand attack goal
Severity: High (indicates organized attack)
Rule 6: Malicious model detected
Why it matters: Model serialization attacks can lead to serious compromise. When Defender for Cloud Model Scanning identifies issues with a custom or opensource model that is part of Azure ML Workspace, Registry, or hosted in Foundry, that may be or may not be a user oversight.
What to detect: Model scan results from Defender for Cloud and if it is being actively used.
What the SOC sees:
- Malicious model
- Applications leveraging the model
- Source IPs and users accessed the model
Severity: Medium (can be user oversight)
Advanced Correlation: Connecting the Dots
The power of Sentinel is correlating your application logs with other security signals. Here are the most valuable correlations:
Correlation 1: Failed GenAI Requests + Failed Sign-Ins = Compromised Account
Why: Account showing both authentication failures and malicious AI prompts is likely compromised within a 1 hour timeframe
l
let timeframe = 1h;
ContainerLogV2
| where TimeGenerated >= ago(timeframe)
| where isnotempty(LogMessage.source_ip)
| extend source_ip=tostring(LogMessage.source_ip)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend session_id=tostring(LogMessage.session_id)
| extend application_name = tostring(LogMessage.application_name)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| extend full_prompt_sample = tostring (LogMessage.full_prompt_sample)
| extend message = tostring (LogMessage.message)
| where security_check_passed == "FAIL" or message contains "WARNING"
| join kind=inner (
SigninLogs
| where ResultType != 0 // 0 means success, non-zero indicates failure
| project TimeGenerated, UserPrincipalName, ResultType, ResultDescription, IPAddress, Location, AppDisplayName
) on $left.end_user_id == $right.UserPrincipalName
| project TimeGenerated, source_ip, end_user_id, application_name, full_prompt_sample, prompt_hash, message, security_check_passed
Severity: High (High probability of compromise)
Correlation 2: Application Logs + Defender for Cloud AI Alerts
Why: Defender for Cloud AI Threat Protection detects platform-level threats (unusual API patterns, data exfiltration attempts). When both your code and the platform flag the same user, confidence is very high.
let timeframe = 1h;
ContainerLogV2
| where TimeGenerated >= ago(timeframe)
| where isnotempty(LogMessage.source_ip)
| extend source_ip=tostring(LogMessage.source_ip)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend session_id=tostring(LogMessage.session_id)
| extend application_name = tostring(LogMessage.application_name)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| extend full_prompt_sample = tostring (LogMessage.full_prompt_sample)
| extend message = tostring (LogMessage.message)
| where security_check_passed == "FAIL" or message contains "WARNING"
| join kind=inner (
AlertEvidence
| where TimeGenerated >= ago(timeframe) and AdditionalFields.Asset == "true"
| where DetectionSource == "Microsoft Defender for AI Services"
| project TimeGenerated, Title, CloudResource
) on $left.application_name == $right.CloudResource
| project TimeGenerated, application_name, end_user_id, source_ip, Title
Severity: Critical (Multi-layer detection)
Correlation 3: Source IP + Threat Intelligence Feeds
Why: If requests come from known malicious IPs (C2 servers, VPN exit nodes used in attacks), treat them as high priority even if behavior seems normal.
//This rule correlates GenAI app activity with Microsoft Threat Intelligence feed available in Sentinel and Microsoft XDR for malicious IP IOCs
let timeframe = 10m;
ContainerLogV2
| where TimeGenerated >= ago(timeframe)
| where isnotempty(LogMessage.source_ip)
| extend source_ip=tostring(LogMessage.source_ip)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend session_id=tostring(LogMessage.session_id)
| extend application_name = tostring(LogMessage.application_name)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| extend full_prompt_sample = tostring (LogMessage.full_prompt_sample)
| join kind=inner (
ThreatIntelIndicators
| where IsActive == "true"
| where ObservableKey startswith "ipv4-addr" or ObservableKey startswith "network-traffic"
| project IndicatorIP = ObservableValue
) on $left.source_ip == $right.IndicatorIP
| project TimeGenerated, source_ip, end_user_id, application_name, full_prompt_sample, security_check_passed
Severity: High (Known bad actor)
Workbooks: What Your SOC Needs to See
Executive Dashboard: GenAI Security Health
Purpose: Leadership wants to know: "Are we secure?" Answer with metrics.
Key visualizations:
- Security Status Tiles (24 hours)
- Total Requests
- Success Rate
- Blocked Threats (Self detected + Content Safety + Threat Protection for AI)
- Rate Limit Violations
- Model Security Score (Red Team evaluation status of currently deployed model)
ContainerLogV2
| where TimeGenerated > ago (1d)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| summarize SuccessCount=countif(security_check_passed == "PASS"), FailedCount=countif(security_check_passed == "FAIL") by bin(TimeGenerated, 1h)
| extend TotalRequests = SuccessCount + FailedCount
| extend SuccessRate = todouble(SuccessCount)/todouble(TotalRequests) * 100
| order by SuccessRate
1. Trend Chart: Pass vs. Fail Over Time
- Shows if attack volume is increasing
- Identifies attack time windows
- Validates that defenses are working
ContainerLogV2
| where TimeGenerated > ago (14d)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| summarize SuccessCount=countif(security_check_passed == "PASS"), FailedCount=countif(security_check_passed == "FAIL") by bin(TimeGenerated, 1d)
| render timechart
2. Top 10 Users by Security Events
- Bar chart of users with most failures
ContainerLogV2
| where TimeGenerated > ago (1d)
| where isnotempty(LogMessage.end_user_id)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| where security_check_passed == "FAIL"
| summarize FailureCount = count() by end_user_id
| top 20 by FailureCount
| render barchart
- Applications with most failures
ContainerLogV2
| where TimeGenerated > ago (1d)
| where isnotempty(LogMessage.application_name)
| extend application_name=tostring(LogMessage.application_name)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| where security_check_passed == "FAIL"
| summarize FailureCount = count() by application_name
| top 20 by FailureCount
| render barchart
3. Geographic Threat Map
- Where are attacks originating?
- Useful for geo-blocking decisions
ContainerLogV2
| where TimeGenerated > ago (1d)
| where isnotempty(LogMessage.application_name)
| extend application_name=tostring(LogMessage.application_name)
| extend source_ip=tostring(LogMessage.source_ip)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| where security_check_passed == "FAIL"
| extend GeoInfo = geo_info_from_ip_address(source_ip)
| project sourceip, GeoInfo.counrty, GeoInfo.city
Analyst Deep-Dive: User Behavior Analysis
Purpose: SOC analyst investigating a specific user or session
Key components:
1. User Activity Timeline
- Every request from the user in time order
ContainerLogV2
| where isnotempty(LogMessage.end_user_id)
| project TimeGenerated, LogMessage.source_ip, LogMessage.end_user_id, LogMessage. session_id, Computer, LogMessage.application_name, LogMessage.request_id, LogMessage.message, LogMessage.full_prompt_sample
| order by tostring(LogMessage_end_user_id), TimeGenerated
Color-coded by security status
AlertInfo
| where DetectionSource == "Microsoft Defender for AI Services"
| project TimeGenerated, AlertId, Title, Category, Severity,
SeverityColor = case(
Severity == "High", "🔴 High",
Severity == "Medium", "🟠 Medium",
Severity == "Low", "🟢 Low",
"⚪ Unknown"
)
2. Session Analysis Table
- All sessions for the user
ContainerLogV2
| where TimeGenerated > ago (1d)
| where isnotempty(LogMessage.end_user_id)
| extend end_user_id=tostring(LogMessage.end_user_id)
| where end_user_id == "<username>" // Replace with actual username
| extend application_name=tostring(LogMessage.application_name)
| extend source_ip=tostring(LogMessage.source_ip)
| extend session_id=tostri1ng(LogMessage.session_id)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| project TimeGenerated, session_id, end_user_id, application_name, security_check_passed
- Failed requests per session
ContainerLogV2
| where TimeGenerated > ago (1d)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| where security_check_passed == "FAIL"
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend session_id=tostring(LogMessage.session_id)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| summarize Failed_Sessions = count() by end_user_id, session_id
| order by Failed_Sessions
- Session duration
ContainerLogV2
| where TimeGenerated > ago (1d)
| where isnotempty(LogMessage.session_id)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| where security_check_passed == "PASS"
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend session_id=tostring(LogMessage.session_id)
| extend application_name=tostring(LogMessage.application_name)
| extend source_ip=tostring(LogMessage.source_ip)
| summarize Start=min(TimeGenerated), End=max(TimeGenerated), count() by end_user_id, session_id, source_ip, application_name
| extend DurationSeconds = datetime_diff("second", End, Start)
3. Prompt Pattern Detection
- Unique prompts by hash
- Frequency of each pattern
- Detect if user is fuzzing/testing boundaries
Sample query for user investigation:
ContainerLogV2
| where TimeGenerated > ago (14d)
| where isnotempty(LogMessage.prompt_hash)
| where isnotempty(LogMessage.full_prompt_sample)
| extend prompt_hash=tostring(LogMessage.prompt_hash)
| extend full_prompt_sample=tostring(LogMessage.full_prompt_sample)
| extend application_name=tostring(LogMessage.application_name)
| summarize count() by prompt_hash, full_prompt_sample
| order by count_
Threat Hunting Dashboard: Proactive Detection
Purpose: Find threats before they trigger alerts
Key queries:
1. Suspicious Keywords in Prompts (e.g. Ignore, Disregard, system prompt, instructions, DAN, jailbreak, pretend, roleplay)
let suspicious_prompts = externaldata (content_policy:int, content_policy_name:string, q_id:int, question:string)
[ @"https://raw.githubusercontent.com/verazuo/jailbreak_llms/refs/heads/main/data/forbidden_question/forbidden_question_set.csv"] with (format="csv", has_header_row=true, ignoreFirstRecord=true);
ContainerLogV2
| where TimeGenerated > ago (14d)
| where isnotempty(LogMessage.full_prompt_sample)
| extend full_prompt_sample=tostring(LogMessage.full_prompt_sample)
| where full_prompt_sample in (suspicious_prompts)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend session_id=tostring(LogMessage.session_id)
| extend application_name=tostring(LogMessage.application_name)
| extend source_ip=tostring(LogMessage.source_ip)
| project TimeGenerated, session_id, end_user_id, source_ip, application_name, full_prompt_sample
2. High-Volume Anomalies
User sending too many requests by a IP or User. Assuming that Foundry Projects are configured to use Azure AD and not API Keys.
//50+ requests in 1 hour
let timeframe = 1h;
let threshold = 50;
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where OperationName == "Completions" or OperationName contains "ChatCompletions"
| extend tokensUsed = todouble(parse_json(properties_s).usage.total_tokens)
| summarize totalTokens = sum(tokensUsed), requests = count() by bin(TimeGenerated, 1h),CallerIPAddress
| where count_ >= threshold
3. Rare Failures (Novel Attack Detection)
Rare failures might indicate zero-day prompts or new attack techniques
//10 or more failures in 24 hours
ContainerLogV2
| where TimeGenerated >= ago (24h)
| where isnotempty(LogMessage.security_check_passed)
| extend security_check_passed=tostring(LogMessage.security_check_passed)
| where security_check_passed == "FAIL"
| extend application_name=tostring(LogMessage.application_name)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend source_ip=tostring(LogMessage.source_ip)
| summarize FailedAttempts = count(), FirstAttempt=min(TimeGenerated), LastAttempt=max(TimeGenerated) by application_name
| extend DurationHours = datetime_diff('hour', LastAttempt, FirstAttempt)
| where DurationHours >= 24 and FailedAttempts >=10
| project application_name, FirstAttempt, LastAttempt, DurationHours, FailedAttempts
Measuring Success: Security Operations Metrics
Key Performance Indicators
Mean Time to Detect (MTTD):
let AppLog = ContainerLogV2
| extend application_name=tostring(LogMessage.application_name)
| extend security_check_passed=tostring (LogMessage.security_check_passed)
| extend session_id=tostring(LogMessage.session_id)
| extend end_user_id=tostring(LogMessage.end_user_id)
| extend source_ip=tostring(LogMessage.source_ip)
| where security_check_passed=="FAIL"
| summarize FirstLogTime=min(TimeGenerated) by application_name, session_id, end_user_id, source_ip;
let Alert = AlertEvidence
| where DetectionSource == "Microsoft Defender for AI Services"
| extend end_user_id = tostring(AdditionalFields.AadUserId)
| extend source_ip=RemoteIP
| extend application_name=CloudResource
| summarize FirstAlertTime=min(TimeGenerated) by AlertId, Title, application_name, end_user_id, source_ip;
AppLog
| join kind=inner (Alert) on application_name, end_user_id, source_ip
| extend DetectionDelayMinutes=datetime_diff('minute', FirstAlertTime, FirstLogTime)
| summarize MTTD_Minutes=round(avg (DetectionDelayMinutes),2) by AlertId, Title
Target: <= 15 minutes from first malicious activity to alert
Mean Time to Respond (MTTR):
SecurityIncident
| where Status in ("New", "Active")
| where CreatedTime >= ago(14d)
| extend ResponseDelay = datetime_diff('minute', LastActivityTime, FirstActivityTime)
| summarize MTTR_Minutes = round (avg (ResponseDelay),2) by CreatedTime, IncidentNumber
| order by CreatedTime, IncidentNumber asc
Target: < 4 hours from alert to remediation
Threat Detection Rate:
ContainerLogV2
| where TimeGenerated > ago (1d)
| extend security_check_passed = tostring (LogMessage.security_check_passed)
| summarize SuccessCount=countif(security_check_passed == "PASS"), FailedCount=countif(security_check_passed == "FAIL") by bin(TimeGenerated, 1h)
| extend TotalRequests = SuccessCount + FailedCount
| extend SuccessRate = todouble(SuccessCount)/todouble(TotalRequests) * 100
| order by SuccessRate
Context: 1-3% is typical for production systems (most traffic is legitimate)
What You've Built
By implementing the logging from Part 2 and the analytics rules in this post, your SOC now has:
✅ Real-time threat detection - Alerts fire within minutes of malicious activity
✅ User attribution - Every incident has identity, IP, and application context
✅ Pattern recognition - Detect both volume-based and behavior-based attacks
✅ Correlation across layers - Application logs + platform alerts + identity signals
✅ Proactive hunting - Dashboards for finding threats before they trigger rules
✅ Executive visibility - Metrics showing program effectiveness
Key Takeaways
- GenAI threats need GenAI-specific analytics - Generic rules miss context like prompt injection, content safety violations, and session-based attacks
- Correlation is critical - The most sophisticated attacks span multiple signals. Correlating app logs with identity and platform alerts catches what individual rules miss.
- User context from Part 2 pays off - end_user_id, source_ip, and session_id enable investigation and response at scale
- Prompt hashing enables pattern detection - Detect repeated attacks without storing sensitive prompt content
- Workbooks serve different audiences - Executives want metrics; analysts want investigation tools; hunters want anomaly detection
- Start with high-fidelity rules - Content Safety violations and rate limit abuse have very low false positive rates. Add behavioral rules after establishing baselines.
What's Next: Closing the Loop
You've now built detection and visibility. In Part 4, we'll close the security operations loop with:
Part 4: Platform Integration and Automated Response
- Building SOAR playbooks for automated incident response
- Implementing automated key rotation with Azure Key Vault
- Blocking identities in Entra
- Creating feedback loops from incidents to code improvements
The journey from blind spot to full security operations capability is almost complete.
Previous:
Next:
Part 4: Platform Integration and Automated Response (Coming soon)