kql
438 TopicsAt-Scale Failure Reporting for Azure Update Manager
Introduction Azure Update Manager simplifies patching across Azure virtual machines and Azure Arc-enabled servers by providing a centralized platform for patch assessment and installation. However, as environments scale, a key challenge emerges—efficiently identifying and troubleshooting patch failures across large fleets of machines. While Azure Update Manager surfaces detailed error messages in the Azure portal, this information is typically available only at an individual machine level. In enterprise environments managing hundreds or thousands of systems, drilling into each VM to find error details quickly becomes impractical. In this article, we walk through a real-world use case and demonstrate how to leverage Azure Resource Graph (ARG) to extract failed machines along with their error details for a specific maintenance run—using a single query. The Challenge: Scaling Patch Failure Visibility In a large enterprise deployment, Azure Update Manager was configured to manage patching across: Windows and Linux virtual machines Azure cloud VMs and Arc-enabled on‑premises servers Multiple regions and subscriptions While patching operations were largely successful, a subset of machines experienced failures. The key challenges faced by the operations team were: Error messages were visible only by drilling into each failed VM in the portal No built‑in way to aggregate failures across all machines Lack of a simple mechanism to export: Failed VMs Error codes Error messages The team needed a scalable, query‑driven approach to analyze failures across an entire maintenance run. Key Insight: Where Azure Update Manager Stores Data Azure Update Manager does not rely on Log Analytics to store operational results. Instead: Patch assessment and installation results are stored in Azure Resource Graph Azure Resource Graph acts as a centralized, queryable store for update operations This design enables powerful querying without requiring additional ingestion, configuration, or cost overhead. Understanding Maintenance Runs and Correlation IDs Each Azure Update Manager maintenance run generates a unique identifier: properties.correlationId represents the maintenance (schedule) run ID All machines involved in the same patch cycle share this ID This allows all machines within a single patch execution to be correlated and queried collectively. The Solution: Query Failed VMs with Error Messages Azure Resource Graph allows querying failures at scale using the maintenanceresources dataset. Core Query (Kusto Query Language) 1 maintenanceresources 2 | where type =~ "microsoft.maintenance/applyupdates" 3 | where tostring(properties.correlationId) contains "<YourMaintenanceRunID>" 4 | where tostring(properties.status) =~ "Failed" 5 | project properties.resourceId, properties.errorCode, properties.errorMessage What This Query Delivers All machines that failed in a specific maintenance run Error codes for troubleshooting Full error messages that are otherwise visible only in the Azure portal Note: Property names for error information can vary by environment. Validate available fields using Azure Resource Graph Explorer and adjust the project clause if required. Sample Output (Conceptual) Resource ID Error Code Error Message vm-01 0x80244007 Windows Update API failed vm-02 0x80072f8f Connectivity issue vm-03 1C WSUS configuration issue Advanced Scenario: Automatically Detecting the Latest Failed Maintenance Run In real-world scenarios, you may not always know the maintenance run ID. The following query dynamically identifies the most recent maintenance run that had failures, and then retrieves all failed machines from that run. 1 // Step 1: Identify the latest maintenance run ID with failures 2 let lastFailedRun = toscalar( 3 maintenanceresources 4 | extend runId = extract(@"applyupdates/(\d+)$", 1, properties.correlationId) 5 | where type =~ "microsoft.maintenance/applyupdates" 6 | where tostring(properties.status) =~ "Failed" 7 | order by tostring(properties.startDateTime) desc 8 | take 1 9 | project runId 10 ); 11 // Step 2: Query all failed VMs from that run 12 maintenanceresources 13 | where type =~ "microsoft.maintenance/applyupdates" 14 | where tostring(properties.correlationId) contains lastFailedRun 15 | where tostring(properties.status) =~ "Failed" 16 | project properties.resourceId, properties.errorCode, properties.errorMessage This approach is ideal for automation, scheduled reporting, and dashboard scenarios. Why This Approach Matters Operational Efficiency Eliminates manual portal navigation Provides consolidated failure insights in seconds Scalability Works across large, distributed environments Supports both Azure and hybrid (Arc‑enabled) machines Automation Ready Can be integrated into scripts, dashboards, and reporting pipelines Enables proactive monitoring and alerting scenarios Best Practices for Enterprise Patch Reporting To maximize the value of this approach: Capture and track maintenance run IDs Use Azure Resource Graph as the primary reporting layer Build reusable queries for different patch scenarios Export reports for compliance and auditing Correlate failures with root‑cause trends over time Conclusion As organizations scale patching operations with Azure Update Manager, visibility, speed, and automation become essential. While the Azure portal is effective for per‑machine troubleshooting, it is not optimized for fleet‑level analysis. Azure Resource Graph fills this gap by enabling a shift from manual troubleshooting to automated, query‑driven failure analysis at scale. By adopting this approach, teams can significantly improve operational efficiency, reduce mean time to resolution, and build a more mature patch management strategy. Final takeaway: Don’t rely only on the portal Leverage Azure Resource Graph to operationalize patch insights at enterprise scale References Azure Update Manager – Query resources with Azure Resource Graph https://learn.microsoft.com/azure/update-manager/query-logs Azure Update Manager – Troubleshooting guide https://learn.microsoft.com/azure/update-manager/troubleshoot Sample Azure Resource Graph queries for Azure Update Manager https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/update-manager/sample-query-logs.mdDetecting AI agents and non-human identities in Microsoft Sentinel: the classic-agent blind spot
Build 2026 made the direction official. The industry is moving from the app era into the agent era, and Microsoft spent a real share of the keynote on securing agents across their lifecycle, from discovering what is exploitable to governing what is running in production. On the identity side the centerpiece is Microsoft Entra Agent ID, now generally available, which gives AI agents first-class identities and extends Conditional Access, Identity Protection, and full audit logging to them. That is good news for agents you build the new way. It is not the whole picture, and the gap is where most SOCs will get hurt first. Modern agents are covered. Classic agents are not. Entra Agent ID draws a hard line between two kinds of agent. Modern agents are created through the Agent ID platform, each backed by an agent identity blueprint. They carry a proper Agent ID, a full audit trail, and the complete set of governance capabilities, including Identity Protection for Agents, which establishes a baseline for an agent's normal activity and flags anomalies automatically. Classic agents are everything that came before, or that gets built outside the platform: AI agents implemented as ordinary service principals or app registrations, for example Copilot Studio agents created before Agent ID was enabled, or any home-grown automation calling Graph with client credentials. In the Entra agent registry they appear with "Has Agent ID: No," and that flag matters, because the Agent ID protections apply to identities that actually hold an Agent ID. Classic agents sit outside Identity Protection for Agents and Conditional Access for Agents. Here is the uncomfortable part. The non-human identities you already run, the service principals behind your pipelines, your integrations, your scripts, your pre-platform Copilot Studio bots, are almost all classic agents. They tend to outnumber your human accounts, they have no MFA in any meaningful sense, and a credential added to one does not show up in the Azure portal. The new platform protections do not reach them. Until you migrate them, the only place you get detection coverage on that population is your SIEM. So this is the job Sentinel does that Agent ID does not: detect risky behavior on the classic, service-principal-backed agents that the platform cannot yet protect. The telemetry you have, and the one switch people forget Three tables carry most of the signal. AADServicePrincipalSignInLogs records service principal authentications, the client-credentials sign-ins your agents and automation use. No user, no MFA, just an app proving it holds a secret or certificate. AADManagedIdentitySignInLogs does the same for managed identities. AuditLogs records directory changes, including the one that matters most for persistence: a new credential added to an application or service principal. One practical warning before any of this works. Service principal and managed identity sign-in logs are not streamed by default. You have to enable those categories explicitly in the Entra diagnostic settings feeding your workspace. Plenty of teams write the detection, never check, and never notice the table is empty. Verify that first. Detection 1: a new credential on a service principal or app Adding a secret or certificate to an existing service principal is one of the cleanest persistence techniques in a Microsoft cloud. The attacker compromises a privileged user or app, drops a fresh credential on a service principal that already holds useful Graph permissions, and now has access that survives password resets and session revocation. It maps to MITRE T1098.001, Account Manipulation: Additional Cloud Credentials. For a classic agent it is especially nasty, because there is no Identity Protection baseline watching it. // Detection 1: new secret or certificate added to an application or service principal // MITRE T1098.001 - Account Manipulation: Additional Cloud Credentials AuditLogs | where OperationName has_any ("Add service principal", "Certificates and secrets management") | where Result =~ "success" | extend Initiator = coalesce( tostring(InitiatedBy.user.userPrincipalName), tostring(InitiatedBy.app.displayName)) | extend InitiatorIp = tostring(InitiatedBy.user.ipAddress) | mv-apply Target = TargetResources on ( where Target.type =~ "Application" | extend TargetName = tostring(Target.displayName), TargetId = tostring(Target.id), KeyChanges = Target.modifiedProperties ) | mv-apply Prop = KeyChanges on ( where tostring(Prop.displayName) =~ "KeyDescription" | extend NewKeys = parse_json(tostring(Prop.newValue)), OldKeys = parse_json(tostring(Prop.oldValue)) ) | extend AddedKeys = set_difference(NewKeys, OldKeys) | where array_length(AddedKeys) > 0 | project TimeGenerated, Initiator, InitiatorIp, TargetName, TargetId, AddedKeys | order by TimeGenerated desc The operation filter catches the three shapes this event takes in the log: "Add service principal," "Add service principal credentials," and "Update application - Certificates and secrets management." The modifiedProperties parsing isolates the KeyDescription change, and set_difference confirms a key was actually added rather than removed, so rotating out an old credential does not, on its own, fire the rule. False positives come from legitimate rotation and from automation that provisions app credentials (CI/CD, infrastructure as code). The initiator is the discriminant. A credential added by your deployment pipeline's service account at the usual time is routine. The same change initiated by an interactive admin out of hours, or by an account that never normally touches app credentials, is what you want to surface. Allow-list the expected initiators, not the targets. Detection 2: a classic agent signing in from a first-seen IP A service principal that has only ever authenticated from your Azure regions and suddenly signs in from somewhere new is a strong signal that its credential has been lifted and is being used elsewhere. Service principals have stable, boring network behavior, which makes a first-seen IP a far cleaner indicator for them than it is for roaming human users. This is the behavioral baseline Identity Protection gives you for free on modern agents, rebuilt in KQL for the classic ones it ignores. MITRE T1078.004, Valid Accounts: Cloud Accounts. // Detection 2: classic-agent service principal signing in from a previously unseen IP // MITRE T1078.004 - Valid Accounts: Cloud Accounts let baseline = 14d; let detection = 1d; let KnownIPs = AADServicePrincipalSignInLogs | where TimeGenerated between (ago(baseline + detection) .. ago(detection)) | where tostring(ResultType) == "0" | summarize KnownIPSet = make_set(IPAddress) by AppId; AADServicePrincipalSignInLogs | where TimeGenerated > ago(detection) | where tostring(ResultType) == "0" | lookup kind=leftouter KnownIPs on AppId | where set_has_element(KnownIPSet, IPAddress) == false | summarize FirstSeen = min(TimeGenerated), Resources = make_set(ResourceDisplayName, 10) by ServicePrincipalName, AppId, IPAddress | order by FirstSeen desc The query builds a per-application baseline of source IPs over the previous two weeks, then flags any successful sign-in today from an address outside that set. Two tuning notes. Brand-new service principals have no baseline, so they surface on first use. That is usually worth seeing once, but you can exclude AppIds younger than the baseline window if it gets noisy. And if your agents egress through shifting cloud IP ranges, widen the comparison from an exact IP to the autonomous system number or a known-range allow-list, otherwise you will chase your own infrastructure. This complements Agent ID, it does not replace it! The endgame is not to run these rules forever. It is to shrink the population they apply to. Inventory your tenant for agents marked "Has Agent ID: No," prioritize the ones holding sensitive Graph permissions, and migrate them onto the Agent ID platform, where Identity Protection and Conditional Access take over the baselining you are doing here by hand. Microsoft has signaled a migration path from classic to modern agents. Treat these two detections as the coverage you need in the meantime, and as a permanent safety net for anything that never makes the move. If you do one thing this week: enable the service principal sign-in log category, deploy detection 1, and pull a list of every service principal that had a credential added in the last 90 days. That list alone tends to be more interesting than people expect. Cheers, Marcel295Views0likes0CommentsXdrLogRaider Defender XDR portal telemetry
A Microsoft Sentinel custom data connector that ingests Microsoft Defender XDR portal-only telemetry — configuration, compliance, drift, exposure, governance — that public Microsoft APIs (Graph Security, Microsoft 365 Defender, MDE) don't expose. https://github.com/akefallonitis/xdrlograider— Defender XDR portal telemetry Happy Hunting 🥳 🎉104Views0likes2CommentsSentinel Foundry - MCP Server (Preview) (Github Community Release)
I’ve been cooking something that a lot of people in SOC have been struggling with — especially on the engineering side of Microsoft Sentinel. Thanks to the Microsoft Security team for shaping the capabilities of Sentinel even better with Sentinel Data Lake & Modern SecOps. Today’s the day I can finally share it. Note: This is not an official Microsoft product, but it is designed to make the Sentinel Build even better (complement) with much more intelligence. 🚀 Sentinel Foundry is now in public preview with 43 tools. (Sentinel Foundry - MCP Server) It’s an MCP server built to act like the brain of a strong Sentinel engineer — helping make building, improving, and operating Sentinel far more practical, faster, and honestly more enjoyable. For a lot of teams, the challenge is not understanding what Sentinel can do. The hard part is the engineering work around it: -> Deciding what data should actually be ingested -> Building a clean, scalable Sentinel foundation -> Writing useful detections instead of noisy ones -> Balancing security value with cost -> Turning ideas into deployable engineering outputs That is exactly why I built Sentinel Foundry to help communities grow stronger. It helps with the real engineering tasks behind Sentinel — from architecture thinking to detection design, deployment planning, ingestion strategy, automation ideas, and many of the workflows outlined in the GitHub project. How does it work? Here’s one of the flagship prompts I ran with it: “Give me a complete security posture report for our workspace. Score each pillar and tell me what to prioritise.” And within seconds, it produced a structured engineering blueprint that would normally take a lot longer to pull together manually. You can see the example prompts here in what it can do: https://github.com/prabhukiranveesam/Sentinel-Foundry#what-can-it-do I want building Sentinel to feel less like repetitive engineering overhead — and more like real security engineering that is fast, creative, and enjoyable. If you work with Sentinel as a SOC L2 analyst, engineer, detection engineer, consultant, or architect, I’d genuinely love for you to try it and tell me what you think. 🔗 Public Preview: https://github.com/prabhukiranveesam/Sentinel-Foundry This is just the start of an AI era — and I’m excited to keep shaping it with more powerful features over the coming days. This is very easy to set up and will be available to all of you at no cost during this month as part of the public preview, and your feedback is extremely valuable to shape this as a powerful solution.532Views0likes1CommentStuck looking up a watchlist value
Hiya, I get stuck working with watchlists sometimes. In this example, I'm wanting to focus on account activity from a list of UPNs. If I split the elements up, I get the individual results, but can't seem to pull it all together. ===================================================== In its entirety, the query returns zero results: let ServiceAccounts=(_GetWatchlist('ServiceAccounts_Monitoring'))| project SearchKey; let OpName = dynamic(['Reset password (self-service)','Reset User Password','Change user password','User reset password','User started password reset','Enable Account','Change password (self-service)','Update PasswordProfile','Self-service password reset flow activity progress']); AuditLogs | where OperationName has_any (OpName) | extend upn = TargetResources.[0].userPrincipalName | where upn in (ServiceAccounts) //<=This is where I think I'm wrong | project upn ===================================================== This line on its own, returns the user on the list: let ServiceAccounts=(_GetWatchlist('ServiceAccounts_Monitoring'))| project SearchKey; ===================================================== This section on its own, returns all the activity let OpName = dynamic(['Reset password (self-service)','Reset User Password','Change user password','User reset password','User started password reset','Enable Account','Change password (self-service)','Update PasswordProfile','Self-service password reset flow activity progress']); AuditLogs | where OperationName has_any (OpName) | extend upn = TargetResources.[0].userPrincipalName | where upn contains "username" //This is the name on the watchlistlist - so I know the activity exists) ==================================================== I'm doing something wrong when I'm trying to use the watchlist cache (I think) Any help\guidance or wisdom would be greatly appreciated! Many thanksSolved93Views0likes2CommentsSecurity Copilot Integration with Microsoft Sentinel - Why Automation matters now
Security Operations Centers face a relentless challenge - the volume of security alerts far exceeds the capacity of human analysts. On average, a mid-sized SOC receives thousands of alerts per day, and analysts spend up to 80% of their time on initial triage. That means determining whether an alert is a true positive, understanding its scope, and deciding on next steps. With Microsoft Security Copilot now deeply integrated into Microsoft Sentinel, there is finally a practical path to automating the most time-consuming parts of this workflow. So I decided to walk you through how to combine Security Copilot with Sentinel to build an automated incident triage pipeline - complete with KQL queries, automation rule patterns, and practical scenarios drawn from common enterprise deployments. Traditional triage workflows rely on analysts manually reviewing each incident - reading alert details, correlating entities across data sources, checking threat intelligence, and making a severity assessment. This is slow, inconsistent, and does not scale. Security Copilot changes this equation by providing: Natural language incident summarization - turning complex, multi-alert incidents into analyst-readable narratives Automated entity enrichment - pulling threat intelligence, user risk scores, and device compliance state without manual lookups Guided response recommendations - suggesting containment and remediation steps based on the incident type and organizational context The key insight is that Copilot does not replace analysts - it handles the repetitive first-pass triage so analysts can focus on decision-making and complex investigations. Architecture - How the Pieces Fit Together The automated triage pipeline consists of four layers: Detection Layer - Sentinel analytics rules generate incidents from log data Enrichment Layer - Automation rules trigger Logic Apps that call Security Copilot Triage Layer - Copilot analyzes the incident, enriches entities, and produces a triage summary Routing Layer - Based on Copilot's assessment, incidents are routed, re-prioritized, or auto-closed (Forgive my AI-painted illustration here, but I find it a nice way to display dependencies.) +-----------------------------------------------------------+ | Microsoft Sentinel | | | | Analytics Rules --> Incidents --> Automation Rules | | | | | v | | Logic App / Playbook | | | | | v | | Security Copilot API | | +-----------------+ | | | Summarize | | | | Enrich Entities | | | | Assess Risk | | | | Recommend Action| | | +--------+--------+ | | | | | v | | +-----------------------------+ | | | Update Incident | | | | - Add triage summary tag | | | | - Adjust severity | | | | - Assign to analyst/team | | | | - Auto-close false positive| | | +-----------------------------+ | +-----------------------------------------------------------+ Step 1 - Identify High-Volume Triage Candidates Not every incident type benefits equally from automated triage. Start with alert types that are high in volume but often turn out to be false positives or low severity. Use this KQL query to identify your top candidates: SecurityIncident | where TimeGenerated > ago(30d) | summarize TotalIncidents = count(), AutoClosed = countif(Classification == "FalsePositive" or Classification == "BenignPositive"), AvgTimeToTriageMinutes = avg(datetime_diff('minute', FirstActivityTime, CreatedTime)) by Title | extend FalsePositiveRate = round(AutoClosed * 100.0 / TotalIncidents, 1) | where TotalIncidents > 10 | order by TotalIncidents desc | take 20 This query surfaces the incident types where automation will deliver the highest ROI. Based on publicly available data and community reports, the following categories consistently appear at the top: Impossible travel alerts (high volume, around 60% false positive rate) Suspicious sign-in activity from unfamiliar locations Mass file download and share events Mailbox forwarding rule creation Step 2 - Build the Copilot-Powered Triage Playbook Create a Logic App playbook that triggers on incident creation and leverages the Security Copilot connector. The core flow looks like this: Trigger: Microsoft Sentinel Incident - When an incident is created Action 1 - Get incident entities: let incidentEntities = SecurityIncident | where IncidentNumber == <IncidentNumber> | mv-expand AlertIds | join kind=inner (SecurityAlert | extend AlertId = SystemAlertId) on $left.AlertIds == $right.AlertId | mv-expand Entities | extend EntityData = parse_json(Entities) | project EntityType = tostring(EntityData.Type), EntityValue = coalesce( tostring(EntityData.HostName), tostring(EntityData.Address), tostring(EntityData.Name), tostring(EntityData.DnsDomain) ); incidentEntities Note: The <IncidentNumber> placeholder above is a Logic App dynamic content variable. When building your playbook, select the incident number from the trigger output rather than hardcoding a value. Action 2 - Copilot prompt session: Send a structured prompt to Security Copilot that requests: Analyze this Microsoft Sentinel incident and provide a triage assessment: Incident Title: {IncidentTitle} Severity: {Severity} Description: {Description} Entities involved: {EntityList} Alert count: {AlertCount} Please provide: 1. A concise summary of what happened (2-3 sentences) 2. Entity risk assessment for each IP, user, and host 3. Whether this appears to be a true positive, benign positive, or false positive 4. Recommended next steps 5. Suggested severity adjustment (if any) Action 3 - Parse and route: Use the Copilot response to update the incident. The Logic App parses the structured output and: Adds the triage summary as an incident comment Tags the incident with copilot-triaged Adjusts severity if Copilot recommends it Routes to the appropriate analyst tier based on the assessment Step 3 - Enrich with Contextual KQL Lookups Security Copilot's assessment improves dramatically when you feed it contextual data. Before sending the prompt, enrich the incident with organization-specific signals: // Check if the user has a history of similar alerts (repeat offender vs. first time) let userAlertHistory = SecurityAlert | where TimeGenerated > ago(90d) | mv-expand Entities | extend EntityData = parse_json(Entities) | where EntityData.Type == "account" | where tostring(EntityData.Name) == "<UserPrincipalName>" | summarize PriorAlertCount = count(), DistinctAlertTypes = dcount(AlertName), LastAlertTime = max(TimeGenerated) | extend IsRepeatOffender = PriorAlertCount > 5; userAlertHistory // Check user risk level from Entra ID Protection AADUserRiskEvents | where TimeGenerated > ago(7d) | where UserPrincipalName == "<UserPrincipalName>" | summarize arg_max(TimeGenerated, RiskLevel), RecentRiskEvents = count() | project RiskLevel, RecentRiskEvents Including this context in the Copilot prompt transforms generic assessments into organization-aware triage decisions. A "suspicious sign-in" for a user who travels internationally every week is very different from the same alert for a user who has never left their home country. Step 4 - Implement Feedback Loops Automated triage is only as good as its accuracy over time. Build a feedback mechanism by tracking Copilot's assessments against analyst final classifications: SecurityIncident | where Tags has "copilot-triaged" | where TimeGenerated > ago(30d) | where Classification != "" | mv-expand Comments | extend CopilotAssessment = extract("Assessment: (True Positive|False Positive|Benign Positive)", 1, tostring(Comments)) | where isnotempty(CopilotAssessment) | summarize Total = dcount(IncidentNumber), Correct = dcountif(IncidentNumber, (CopilotAssessment == "False Positive" and Classification == "FalsePositive") or (CopilotAssessment == "True Positive" and Classification == "TruePositive") or (CopilotAssessment == "Benign Positive" and Classification == "BenignPositive") ) by bin(TimeGenerated, 7d) | extend AccuracyPercent = round(Correct * 100.0 / Total, 1) | order by TimeGenerated asc For this query to work reliably, the automation playbook must write the assessment in a consistent format within the incident comments. Use a structured prefix such as Assessment: True Positive so the regex extraction remains stable. According to Microsoft's published benchmarks and community feedback, Copilot-assisted triage typically achieves 85-92% agreement with senior analyst classifications after prompt tuning - significantly reducing the manual triage burden. A Note on Licensing and Compute Units Security Copilot is licensed through Security Compute Units (SCUs), which are provisioned in Azure. Each prompt session consumes SCUs based on the complexity of the request. For automated triage at scale, plan your SCU capacity carefully - high-volume playbooks can accumulate significant usage. Start with a conservative allocation, monitor consumption through the Security Copilot usage dashboard, and scale up as you validate ROI. Microsoft provides detailed guidance on SCU sizing in the official Security Copilot documentation. Example Scenario - Impossible Travel at Scale Consider a typical enterprise that generates over 200 impossible travel alerts per week. The SOC team spends roughly 15 hours weekly just triaging these. Here is how automated triage addresses this: Detection - Sentinel's built-in impossible travel analytics rule flags the incidents Enrichment - The playbook pulls each user's typical travel patterns from sign-in logs over the past 90 days, VPN usage, and whether the "impossible" location matches any known corporate office or VPN egress point Copilot Analysis - Security Copilot receives the enriched context and classifies each incident Expected Result - Based on common deployment patterns, around 70-75% of impossible travel incidents are auto-closed as benign (VPN, known travel patterns), roughly 20% are downgraded to informational with a triage note, and only about 5% are escalated to analysts as genuine suspicious activity This type of automation can reclaim over 10 hours per week - time that analysts can redirect to proactive threat hunting. Getting Started - Practical Recommendations For teams ready to implement automated triage with Security Copilot and Sentinel, here is a recommended approach: Start small. Pick one high-volume, high-false-positive incident type. Do not try to automate everything at once. Run in shadow mode first. Have the playbook add triage comments but do not auto-close or re-route. Let analysts compare Copilot's assessment with their own for two to four weeks. Tune your prompts. Generic prompts produce generic results. Include organization-specific context - naming conventions, known infrastructure, typical user behavior patterns. Monitor accuracy continuously. Use the feedback loop KQL above. If accuracy drops below 80%, pause automation and investigate. Maintain human oversight. Even at 90%+ accuracy, keep a human review step for high-severity incidents. Automation handles volume - analysts handle judgment. The combination of Security Copilot and Microsoft Sentinel represents a genuine step forward for SOC efficiency. By automating the initial triage pass - summarizing incidents, enriching entities, and providing classification recommendations - analysts are freed to focus on what humans do best: making nuanced security decisions under uncertainty. Feel free to like or/and connect :)308Views0likes0CommentsRSAC 2026: What the Sentinel Playbook Generator actually means for SOC automation
RSAC 2026 brought a wave of Sentinel announcements, but the one I keep coming back to is the playbook generator. Not because it's the flashiest, but because it touches something that's been a real operational pain point for years: the gap between what SOC teams need to automate and what they can realistically build and maintain. I want to unpack what this actually changes from an operational perspective, because I think the implications go further than "you can now vibe-code a playbook." The problem it solves If you've built and maintained Logic Apps playbooks in Sentinel at any scale, you know the friction. You need a connector for every integration. If there isn't one, you're writing custom HTTP actions with authentication handling, pagination, error handling - all inside a visual designer that wasn't built for complex branching logic. Debugging is painful. Version control is an afterthought. And when something breaks at 2am, the person on call needs to understand both the Logic Apps runtime AND the security workflow to fix it. The result in most environments I've seen: teams build a handful of playbooks for the obvious use cases (isolate host, disable account, post to Teams) and then stop. The long tail of automation - the enrichment workflows, the cross-tool correlation, the conditional response chains - stays manual because building it is too expensive relative to the time saved. What's actually different now The playbook generator produces Python. Not Logic Apps JSON, not ARM templates - actual Python code with documentation and a visual flowchart. You describe the workflow in natural language, the system proposes a plan, asks clarifying questions, and then generates the code once you approve. The Integration Profile concept is where this gets interesting. Instead of relying on predefined connectors, you define a base URL, auth method, and credentials for any service - and the generator creates dynamic API calls against it. This means you can automate against ServiceNow, Jira, Slack, your internal CMDB, or any REST API without waiting for Microsoft or a partner to ship a connector. The embedded VS Code experience with plan mode and act mode is a deliberate design choice. Plan mode lets you iterate on the workflow before any code is generated. Act mode produces the implementation. You can then validate against real alerts and refine through conversation or direct code edits. This is a meaningful improvement over the "deploy and pray" cycle most of us have with Logic Apps. Where I see the real impact For environments running Sentinel at scale, the playbook generator could unlock the automation long tail I mentioned above. The workflows that were never worth the Logic Apps development effort might now be worth a 15-minute conversation with the generator. Think: enrichment chains that pull context from three different tools before deciding on a response path, or conditional escalation workflows that factor in asset criticality, time of day, and analyst availability. There's also an interesting angle for teams that operate across Microsoft and non-Microsoft tooling. If your SOC uses Sentinel for SIEM but has Palo Alto, CrowdStrike, or other vendors in the stack, the Integration Profile approach means you can build cross-vendor response playbooks without middleware. The questions I'd genuinely like to hear about A few things that aren't clear from the documentation and that I think matter for production use: Security Copilot dependency: The prerequisites require a Security Copilot workspace with EU or US capacity. Someone in the blog comments already flagged this as a potential blocker for organizations that have Sentinel but not Security Copilot. Is this a hard requirement going forward, or will there be a path for Sentinel-only customers? Code lifecycle management: The generated Python runs... where exactly? What's the execution runtime? How do you version control, test, and promote these playbooks across dev/staging/prod? Logic Apps had ARM templates and CI/CD patterns. What's the equivalent here? Integration Profile security: You're storing credentials for potentially every tool in your security stack inside these profiles. What's the credential storage model? Is this backed by Key Vault? How do you rotate credentials without breaking running playbooks? Debugging in production: When a generated playbook fails at 2am, what does the troubleshooting experience look like? Do you get structured logs, execution traces, retry telemetry? Or are you reading Python stack traces? Coexistence with Logic Apps: Most environments won't rip and replace overnight. What's the intended coexistence model between generated Python playbooks and existing Logic Apps automation rules? I'm genuinely optimistic about this direction. Moving from a low-code visual designer to an AI-assisted coding model with transparent, editable output feels like the right architectural bet for where SOC automation needs to go. But the operational details around lifecycle, security, and debugging will determine whether this becomes a production staple or stays a demo-only feature. Would be interested to hear from anyone who's been in the preview - what's the reality like compared to the pitch?Solved213Views0likes1CommentIngest Microsoft XDR Advanced Hunting Data into Microsoft Sentinel
I had difficulty finding a guide that can query Microsoft Defender vulnerability management Advanced Hunting tables in Microsoft Sentinel for alerting and automation. As a result, I put together this guide to demonstrate how to ingest Microsoft XDR Advanced Hunting query results into Microsoft Sentinel using Azure Logic Apps and System‑Assigned Managed Identity. The solution allows you to: Run Advanced Hunting queries on a schedule Collect high‑risk vulnerability data (or other hunting results) Send the results to a Sentinel workspace as custom logs Create alerts and automation rules based on this data This approach avoids credential storage and follows least privilege and managed identity best practices. Prerequisites Before you begin, ensure you have: Microsoft Defender XDR access Microsoft Sentinel deployed Azure Logic Apps permission Application Administrator or higher in Microsoft Entra ID PowerShell with Az modules installed Contributor access to the Sentinel workspace Architecture at a Glance Logic App (Managed Identity) ↓ Microsoft XDR Advanced Hunting API ↓ Logic App ↓ Log Analytics Data Collector API ↓ Microsoft Sentinel (Custom Log) Step 1: Create a Logic App In the Azure Portal, go to Logic Apps Create a new Consumption Logic App Choose the appropriate: Subscription Resource Group Region Step 2: Enable System‑Assigned Managed Identity Open the Logic App Navigate to Settings → Identity Enable System‑assigned managed identity Click Save Note the Object ID This identity will later be granted permission to run Advanced Hunting queries. Step 3: Locate the Logic App in Entra ID Go to Microsoft Entra ID → Enterprise Applications Change filter to All Applications Search for your Logic App name Select the app to confirm it exists Step 4: Grant Advanced Hunting Permissions (PowerShell) Advanced Hunting permissions cannot be assigned via the portal and must be done using PowerShell. Required Permission AdvancedQuery.Read.All PowerShell Script # Your tenant ID (in the Azure portal, under Azure Active Directory > Overview). $TenantID=”Your TenantID” Connect-AzAccount -TenantId $TenantID # Get the ID of the managed identity for the app. $spID = “Your Managed Identity” # Get the service principal for Microsoft Graph by providing the AppID of WindowsDefender ATP $GraphServicePrincipal = Get-AzADServicePrincipal -Filter "AppId eq 'fc780465-2017-40d4-a0c5-307022471b92'" | Select-Object Id # Extract the Advanced query ID. $AppRole = $GraphServicePrincipal.AppRole | ` Where-Object {$_.Value -contains "AdvancedQuery.Read.All"} # If AppRoleID comes up with blank value, it can be replaced with 93489bf5-0fbc-4f2d-b901-33f2fe08ff05 # Now add the permission to the app to read the advanced queries New-AzADServicePrincipalAppRoleAssignment -ServicePrincipalId $spID -ResourceId $GraphServicePrincipal.Id -AppRoleId $AppRole.Id # Or New-AzADServicePrincipalAppRoleAssignment -ServicePrincipalId $spID -ResourceId $GraphServicePrincipal.Id -AppRoleId 93489bf5-0fbc-4f2d-b901-33f2fe08ff05 After successful execution, verify the permission under Enterprise Applications → Permissions. Step 5: Build the Logic App Workflow Open Logic App Designer and create the following flow: Trigger Recurrence (e.g., every 24 hours Run Advanced Hunting Query Connector: Microsoft Defender ATP Authentication: System‑Assigned Managed Identity Action: Run Advanced Hunting Query Sample KQL Query (High‑Risk Vulnerabilities) Send Data to Log Analytics (Sentinel) On Send Data, create a new connection and provide the workspace information where the Sentinel log exists. Obtaining the Workspace Key is not straightforward, we need to retrieve using the PowerShell command. Get-AzOperationalInsightsWorkspaceSharedKey ` -ResourceGroupName "<ResourceGroupName>" ` -Name "<WorkspaceName>" Configuration Details Workspace ID Primary key Log Type (example): XDRVulnerability_CL Request body: Results array from Advanced Hunting Step 6: Run the Logic app to return results In the logic app designer select run, If the run is successful data will be sent to sentinel workspace. Step 7: Validate Data in Microsoft Sentinel In Sentinel, run the query: XDRVulnerability_CL | where TimeGenerated > ago(24h) If data appears, ingestion is successful. Step 8: Create Alerts & Automation Rules Use Sentinel to: Create analytics rules for: CVSS > 9 Exploit available New vulnerabilities in last 24 hours Trigger: Email notifications Incident creation SOAR playbooks Conclusion By combining Logic Apps, Managed Identities, Microsoft XDR, and Microsoft Sentinel, you can create a powerful, secure, and scalable pipeline for ingesting hunting intelligence and triggering proactive detections.248Views1like1CommentClarification on UEBA Behaviors Layer Support for Zscaler and Fortinet Logs
I would like to confirm whether the new UEBA Behaviors Layer in Microsoft Sentinel currently supports generating behavior insights for Zscaler and Fortinet log sources. Based on the documentation, the preview version of the Behaviors Layer only supports specific vendors under CommonSecurityLog (CyberArk Vault and Palo Alto Threats), AWS CloudTrail services, and GCP Audit Logs. Since Zscaler and Fortinet are not listed among the supported vendors, I want to verify: Does the UEBA Behaviors Layer generate behavior records for Zscaler and Fortinet logs, or are these vendors currently unsupported for behavior generation? As logs from Zscaler and Fortinet will also be get ingested in CommonSecurityLog table only.Solved210Views0likes1CommentHow Should a Fresher Learn Microsoft Sentinel Properly?
Hello everyone, I am a fresher interested in learning Microsoft Sentinel and preparing for SOC roles. Since Sentinel is a cloud-native enterprise tool and usually used inside organizations, I am unsure how individuals without company access are expected to gain real hands-on experience. I would like to hear from professionals who actively use Sentinel: - How do freshers typically learn and practice Sentinel? - What learning resources or environments are commonly used by beginners? - What level of hands-on experience is realistically expected at entry level? I am looking for guidance based on real industry practice. Thank you for your time.296Views0likes2Comments