kql
316 TopicsAt-Scale Failure Reporting for Azure Update Manager
Introduction Azure Update Manager simplifies patching across Azure virtual machines and Azure Arc-enabled servers by providing a centralized platform for patch assessment and installation. However, as environments scale, a key challenge emerges—efficiently identifying and troubleshooting patch failures across large fleets of machines. While Azure Update Manager surfaces detailed error messages in the Azure portal, this information is typically available only at an individual machine level. In enterprise environments managing hundreds or thousands of systems, drilling into each VM to find error details quickly becomes impractical. In this article, we walk through a real-world use case and demonstrate how to leverage Azure Resource Graph (ARG) to extract failed machines along with their error details for a specific maintenance run—using a single query. The Challenge: Scaling Patch Failure Visibility In a large enterprise deployment, Azure Update Manager was configured to manage patching across: Windows and Linux virtual machines Azure cloud VMs and Arc-enabled on‑premises servers Multiple regions and subscriptions While patching operations were largely successful, a subset of machines experienced failures. The key challenges faced by the operations team were: Error messages were visible only by drilling into each failed VM in the portal No built‑in way to aggregate failures across all machines Lack of a simple mechanism to export: Failed VMs Error codes Error messages The team needed a scalable, query‑driven approach to analyze failures across an entire maintenance run. Key Insight: Where Azure Update Manager Stores Data Azure Update Manager does not rely on Log Analytics to store operational results. Instead: Patch assessment and installation results are stored in Azure Resource Graph Azure Resource Graph acts as a centralized, queryable store for update operations This design enables powerful querying without requiring additional ingestion, configuration, or cost overhead. Understanding Maintenance Runs and Correlation IDs Each Azure Update Manager maintenance run generates a unique identifier: properties.correlationId represents the maintenance (schedule) run ID All machines involved in the same patch cycle share this ID This allows all machines within a single patch execution to be correlated and queried collectively. The Solution: Query Failed VMs with Error Messages Azure Resource Graph allows querying failures at scale using the maintenanceresources dataset. Core Query (Kusto Query Language) 1 maintenanceresources 2 | where type =~ "microsoft.maintenance/applyupdates" 3 | where tostring(properties.correlationId) contains "<YourMaintenanceRunID>" 4 | where tostring(properties.status) =~ "Failed" 5 | project properties.resourceId, properties.errorCode, properties.errorMessage What This Query Delivers All machines that failed in a specific maintenance run Error codes for troubleshooting Full error messages that are otherwise visible only in the Azure portal Note: Property names for error information can vary by environment. Validate available fields using Azure Resource Graph Explorer and adjust the project clause if required. Sample Output (Conceptual) Resource ID Error Code Error Message vm-01 0x80244007 Windows Update API failed vm-02 0x80072f8f Connectivity issue vm-03 1C WSUS configuration issue Advanced Scenario: Automatically Detecting the Latest Failed Maintenance Run In real-world scenarios, you may not always know the maintenance run ID. The following query dynamically identifies the most recent maintenance run that had failures, and then retrieves all failed machines from that run. 1 // Step 1: Identify the latest maintenance run ID with failures 2 let lastFailedRun = toscalar( 3 maintenanceresources 4 | extend runId = extract(@"applyupdates/(\d+)$", 1, properties.correlationId) 5 | where type =~ "microsoft.maintenance/applyupdates" 6 | where tostring(properties.status) =~ "Failed" 7 | order by tostring(properties.startDateTime) desc 8 | take 1 9 | project runId 10 ); 11 // Step 2: Query all failed VMs from that run 12 maintenanceresources 13 | where type =~ "microsoft.maintenance/applyupdates" 14 | where tostring(properties.correlationId) contains lastFailedRun 15 | where tostring(properties.status) =~ "Failed" 16 | project properties.resourceId, properties.errorCode, properties.errorMessage This approach is ideal for automation, scheduled reporting, and dashboard scenarios. Why This Approach Matters Operational Efficiency Eliminates manual portal navigation Provides consolidated failure insights in seconds Scalability Works across large, distributed environments Supports both Azure and hybrid (Arc‑enabled) machines Automation Ready Can be integrated into scripts, dashboards, and reporting pipelines Enables proactive monitoring and alerting scenarios Best Practices for Enterprise Patch Reporting To maximize the value of this approach: Capture and track maintenance run IDs Use Azure Resource Graph as the primary reporting layer Build reusable queries for different patch scenarios Export reports for compliance and auditing Correlate failures with root‑cause trends over time Conclusion As organizations scale patching operations with Azure Update Manager, visibility, speed, and automation become essential. While the Azure portal is effective for per‑machine troubleshooting, it is not optimized for fleet‑level analysis. Azure Resource Graph fills this gap by enabling a shift from manual troubleshooting to automated, query‑driven failure analysis at scale. By adopting this approach, teams can significantly improve operational efficiency, reduce mean time to resolution, and build a more mature patch management strategy. Final takeaway: Don’t rely only on the portal Leverage Azure Resource Graph to operationalize patch insights at enterprise scale References Azure Update Manager – Query resources with Azure Resource Graph https://learn.microsoft.com/azure/update-manager/query-logs Azure Update Manager – Troubleshooting guide https://learn.microsoft.com/azure/update-manager/troubleshoot Sample Azure Resource Graph queries for Azure Update Manager https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/update-manager/sample-query-logs.mdSentinel Foundry - MCP Server (Preview) (Github Community Release)
I’ve been cooking something that a lot of people in SOC have been struggling with — especially on the engineering side of Microsoft Sentinel. Thanks to the Microsoft Security team for shaping the capabilities of Sentinel even better with Sentinel Data Lake & Modern SecOps. Today’s the day I can finally share it. Note: This is not an official Microsoft product, but it is designed to make the Sentinel Build even better (complement) with much more intelligence. 🚀 Sentinel Foundry is now in public preview with 43 tools. (Sentinel Foundry - MCP Server) It’s an MCP server built to act like the brain of a strong Sentinel engineer — helping make building, improving, and operating Sentinel far more practical, faster, and honestly more enjoyable. For a lot of teams, the challenge is not understanding what Sentinel can do. The hard part is the engineering work around it: -> Deciding what data should actually be ingested -> Building a clean, scalable Sentinel foundation -> Writing useful detections instead of noisy ones -> Balancing security value with cost -> Turning ideas into deployable engineering outputs That is exactly why I built Sentinel Foundry to help communities grow stronger. It helps with the real engineering tasks behind Sentinel — from architecture thinking to detection design, deployment planning, ingestion strategy, automation ideas, and many of the workflows outlined in the GitHub project. How does it work? Here’s one of the flagship prompts I ran with it: “Give me a complete security posture report for our workspace. Score each pillar and tell me what to prioritise.” And within seconds, it produced a structured engineering blueprint that would normally take a lot longer to pull together manually. You can see the example prompts here in what it can do: https://github.com/prabhukiranveesam/Sentinel-Foundry#what-can-it-do I want building Sentinel to feel less like repetitive engineering overhead — and more like real security engineering that is fast, creative, and enjoyable. If you work with Sentinel as a SOC L2 analyst, engineer, detection engineer, consultant, or architect, I’d genuinely love for you to try it and tell me what you think. 🔗 Public Preview: https://github.com/prabhukiranveesam/Sentinel-Foundry This is just the start of an AI era — and I’m excited to keep shaping it with more powerful features over the coming days. This is very easy to set up and will be available to all of you at no cost during this month as part of the public preview, and your feedback is extremely valuable to shape this as a powerful solution.484Views0likes1CommentDetecting AI agents and non-human identities in Microsoft Sentinel: the classic-agent blind spot
Build 2026 made the direction official. The industry is moving from the app era into the agent era, and Microsoft spent a real share of the keynote on securing agents across their lifecycle, from discovering what is exploitable to governing what is running in production. On the identity side the centerpiece is Microsoft Entra Agent ID, now generally available, which gives AI agents first-class identities and extends Conditional Access, Identity Protection, and full audit logging to them. That is good news for agents you build the new way. It is not the whole picture, and the gap is where most SOCs will get hurt first. Modern agents are covered. Classic agents are not. Entra Agent ID draws a hard line between two kinds of agent. Modern agents are created through the Agent ID platform, each backed by an agent identity blueprint. They carry a proper Agent ID, a full audit trail, and the complete set of governance capabilities, including Identity Protection for Agents, which establishes a baseline for an agent's normal activity and flags anomalies automatically. Classic agents are everything that came before, or that gets built outside the platform: AI agents implemented as ordinary service principals or app registrations, for example Copilot Studio agents created before Agent ID was enabled, or any home-grown automation calling Graph with client credentials. In the Entra agent registry they appear with "Has Agent ID: No," and that flag matters, because the Agent ID protections apply to identities that actually hold an Agent ID. Classic agents sit outside Identity Protection for Agents and Conditional Access for Agents. Here is the uncomfortable part. The non-human identities you already run, the service principals behind your pipelines, your integrations, your scripts, your pre-platform Copilot Studio bots, are almost all classic agents. They tend to outnumber your human accounts, they have no MFA in any meaningful sense, and a credential added to one does not show up in the Azure portal. The new platform protections do not reach them. Until you migrate them, the only place you get detection coverage on that population is your SIEM. So this is the job Sentinel does that Agent ID does not: detect risky behavior on the classic, service-principal-backed agents that the platform cannot yet protect. The telemetry you have, and the one switch people forget Three tables carry most of the signal. AADServicePrincipalSignInLogs records service principal authentications, the client-credentials sign-ins your agents and automation use. No user, no MFA, just an app proving it holds a secret or certificate. AADManagedIdentitySignInLogs does the same for managed identities. AuditLogs records directory changes, including the one that matters most for persistence: a new credential added to an application or service principal. One practical warning before any of this works. Service principal and managed identity sign-in logs are not streamed by default. You have to enable those categories explicitly in the Entra diagnostic settings feeding your workspace. Plenty of teams write the detection, never check, and never notice the table is empty. Verify that first. Detection 1: a new credential on a service principal or app Adding a secret or certificate to an existing service principal is one of the cleanest persistence techniques in a Microsoft cloud. The attacker compromises a privileged user or app, drops a fresh credential on a service principal that already holds useful Graph permissions, and now has access that survives password resets and session revocation. It maps to MITRE T1098.001, Account Manipulation: Additional Cloud Credentials. For a classic agent it is especially nasty, because there is no Identity Protection baseline watching it. // Detection 1: new secret or certificate added to an application or service principal // MITRE T1098.001 - Account Manipulation: Additional Cloud Credentials AuditLogs | where OperationName has_any ("Add service principal", "Certificates and secrets management") | where Result =~ "success" | extend Initiator = coalesce( tostring(InitiatedBy.user.userPrincipalName), tostring(InitiatedBy.app.displayName)) | extend InitiatorIp = tostring(InitiatedBy.user.ipAddress) | mv-apply Target = TargetResources on ( where Target.type =~ "Application" | extend TargetName = tostring(Target.displayName), TargetId = tostring(Target.id), KeyChanges = Target.modifiedProperties ) | mv-apply Prop = KeyChanges on ( where tostring(Prop.displayName) =~ "KeyDescription" | extend NewKeys = parse_json(tostring(Prop.newValue)), OldKeys = parse_json(tostring(Prop.oldValue)) ) | extend AddedKeys = set_difference(NewKeys, OldKeys) | where array_length(AddedKeys) > 0 | project TimeGenerated, Initiator, InitiatorIp, TargetName, TargetId, AddedKeys | order by TimeGenerated desc The operation filter catches the three shapes this event takes in the log: "Add service principal," "Add service principal credentials," and "Update application - Certificates and secrets management." The modifiedProperties parsing isolates the KeyDescription change, and set_difference confirms a key was actually added rather than removed, so rotating out an old credential does not, on its own, fire the rule. False positives come from legitimate rotation and from automation that provisions app credentials (CI/CD, infrastructure as code). The initiator is the discriminant. A credential added by your deployment pipeline's service account at the usual time is routine. The same change initiated by an interactive admin out of hours, or by an account that never normally touches app credentials, is what you want to surface. Allow-list the expected initiators, not the targets. Detection 2: a classic agent signing in from a first-seen IP A service principal that has only ever authenticated from your Azure regions and suddenly signs in from somewhere new is a strong signal that its credential has been lifted and is being used elsewhere. Service principals have stable, boring network behavior, which makes a first-seen IP a far cleaner indicator for them than it is for roaming human users. This is the behavioral baseline Identity Protection gives you for free on modern agents, rebuilt in KQL for the classic ones it ignores. MITRE T1078.004, Valid Accounts: Cloud Accounts. // Detection 2: classic-agent service principal signing in from a previously unseen IP // MITRE T1078.004 - Valid Accounts: Cloud Accounts let baseline = 14d; let detection = 1d; let KnownIPs = AADServicePrincipalSignInLogs | where TimeGenerated between (ago(baseline + detection) .. ago(detection)) | where tostring(ResultType) == "0" | summarize KnownIPSet = make_set(IPAddress) by AppId; AADServicePrincipalSignInLogs | where TimeGenerated > ago(detection) | where tostring(ResultType) == "0" | lookup kind=leftouter KnownIPs on AppId | where set_has_element(KnownIPSet, IPAddress) == false | summarize FirstSeen = min(TimeGenerated), Resources = make_set(ResourceDisplayName, 10) by ServicePrincipalName, AppId, IPAddress | order by FirstSeen desc The query builds a per-application baseline of source IPs over the previous two weeks, then flags any successful sign-in today from an address outside that set. Two tuning notes. Brand-new service principals have no baseline, so they surface on first use. That is usually worth seeing once, but you can exclude AppIds younger than the baseline window if it gets noisy. And if your agents egress through shifting cloud IP ranges, widen the comparison from an exact IP to the autonomous system number or a known-range allow-list, otherwise you will chase your own infrastructure. This complements Agent ID, it does not replace it! The endgame is not to run these rules forever. It is to shrink the population they apply to. Inventory your tenant for agents marked "Has Agent ID: No," prioritize the ones holding sensitive Graph permissions, and migrate them onto the Agent ID platform, where Identity Protection and Conditional Access take over the baselining you are doing here by hand. Microsoft has signaled a migration path from classic to modern agents. Treat these two detections as the coverage you need in the meantime, and as a permanent safety net for anything that never makes the move. If you do one thing this week: enable the service principal sign-in log category, deploy detection 1, and pull a list of every service principal that had a credential added in the last 90 days. That list alone tends to be more interesting than people expect. Cheers, Marcel238Views0likes0CommentsXdrLogRaider Defender XDR portal telemetry
A Microsoft Sentinel custom data connector that ingests Microsoft Defender XDR portal-only telemetry — configuration, compliance, drift, exposure, governance — that public Microsoft APIs (Graph Security, Microsoft 365 Defender, MDE) don't expose. https://github.com/akefallonitis/xdrlograider— Defender XDR portal telemetry Happy Hunting 🥳 🎉99Views0likes2CommentsUnderstand New Sentinel Pricing Model with Sentinel Data Lake Tier
Introduction on Sentinel and its New Pricing Model Microsoft Sentinel is a cloud-native Security Information and Event Management (SIEM) and Security Orchestration, Automation, and Response (SOAR) platform that collects, analyzes, and correlates security data from across your environment to detect threats and automate response. Traditionally, Sentinel stored all ingested data in the Analytics tier (Log Analytics workspace), which is powerful but expensive for high-volume logs. To reduce cost and enable customers to retain all security data without compromise, Microsoft introduced a new dual-tier pricing model consisting of the Analytics tier and the Data Lake tier. The Analytics tier continues to support fast, real-time querying and analytics for core security scenarios, while the new Data Lake tier provides very low-cost storage for long-term retention and high-volume datasets. Customers can now choose where each data type lands—analytics for high-value detections and investigations, and data lake for large or archival types—allowing organizations to significantly lower cost while still retaining all their security data for analytics, compliance, and hunting. Please flow diagram depicts new sentinel pricing model: Now let's understand this new pricing model with below scenarios: Scenario 1A (PAY GO) Scenario 1B (Usage Commitment) Scenario 2 (Data Lake Tier Only) Scenario 1A (PAY GO) Requirement Suppose you need to ingest 10 GB of data per day, and you must retain that data for 2 years. However, you will only frequently use, query, and analyze the data for the first 6 months. Solution To optimize cost, you can ingest the data into the Analytics tier and retain it there for the first 6 months, where active querying and investigation happen. After that period, the remaining 18 months of retention can be shifted to the Data Lake tier, which provides low-cost storage for compliance and auditing needs. But you will be charged separately for data lake tier querying and analytics which depicted as Compute (D) in pricing flow diagram. Pricing Flow / Notes The first 10 GB/day ingested into the Analytics tier is free for 31 days under the Analytics logs plan. All data ingested into the Analytics tier is automatically mirrored to the Data Lake tier at no additional ingestion or retention cost. For the first 6 months, you pay only for Analytics tier ingestion and retention, excluding any free capacity. For the next 18 months, you pay only for Data Lake tier retention, which is significantly cheaper. Azure Pricing Calculator Equivalent Assuming no data is queried or analyzed during the 18-month Data Lake tier retention period: Although the Analytics tier retention is set to 6 months, the first 3 months of retention fall under the free retention limit, so retention charges apply only for the remaining 3 months of the analytics retention window. Azure pricing calculator will adjust accordingly. Scenario 1B (Usage Commitment) Now, suppose you are ingesting 100 GB per day. If you follow the same pay-as-you-go pricing model described above, your estimated cost would be approximately $15,204 per month. However, you can reduce this cost by choosing a Commitment Tier, where Analytics tier ingestion is billed at a discounted rate. Note that the discount applies only to Analytics tier ingestion—it does not apply to Analytics tier retention costs or to any Data Lake tier–related charges. Please refer to the pricing flow and the equivalent pricing calculator results shown below. Monthly cost savings: $15,204 – $11,184 = $4,020 per month Now the question is: What happens if your usage reaches 150 GB per day? Will the additional 50 GB be billed at the Pay-As-You-Go rate? No. The entire 150 GB/day will still be billed at the discounted rate associated with the 100 GB/day commitment tier bucket. Azure Pricing Calculator Equivalent (100 GB/ Day) Azure Pricing Calculator Equivalent (150 GB/ Day) Scenario 2 (Data Lake Tier Only) Requirement Suppose you need to store certain audit or compliance logs amounting to 10 GB per day. These logs are not used for querying, analytics, or investigations on a regular basis, but must be retained for 2 years as per your organization’s compliance or forensic policies. Solution Since these logs are not actively analyzed, you should avoid ingesting them into the Analytics tier, which is more expensive and optimized for active querying. Instead, send them directly to the Data Lake tier, where they can be retained cost-effectively for future audit, compliance, or forensic needs. Pricing Flow Because the data is ingested directly into the Data Lake tier, you pay both ingestion and retention costs there for the entire 2-year period. If, at any point in the future, you need to perform advanced analytics, querying, or search, you will incur additional compute charges, based on actual usage. Even with occasional compute charges, the cost remains significantly lower than storing the same data in the Analytics tier. Realized Savings Scenario Cost per Month Scenario 1: 10 GB/day in Analytics tier $1,520.40 Scenario 2: 10 GB/day directly into Data Lake tier $202.20 (without compute) $257.20 (with sample compute price) Savings with no compute activity: $1,520.40 – $202.20 = $1,318.20 per month Savings with some compute activity (sample value): $1,520.40 – $257.20 = $1,263.20 per month Azure calculator equivalent without compute Azure calculator equivalent with Sample Compute Conclusion The combination of the Analytics tier and the Data Lake tier in Microsoft Sentinel enables organizations to optimize cost based on how their security data is used. High-value logs that require frequent querying, real-time analytics, and investigation can be stored in the Analytics tier, which provides powerful search performance and built-in detection capabilities. At the same time, large-volume or infrequently accessed logs—such as audit, compliance, or long-term retention data—can be directed to the Data Lake tier, which offers dramatically lower storage and ingestion costs. Because all Analytics tier data is automatically mirrored to the Data Lake tier at no extra cost, customers can use the Analytics tier only for the period they actively query data, and rely on the Data Lake tier for the remaining retention. This tiered model allows different scenarios—active investigation, archival storage, compliance retention, or large-scale telemetry ingestion—to be handled at the most cost-effective layer, ultimately delivering substantial savings without sacrificing visibility, retention, or future analytical capabilities.Solved2.8KViews2likes6CommentsHow Should a Fresher Learn Microsoft Sentinel Properly?
Hello everyone, I am a fresher interested in learning Microsoft Sentinel and preparing for SOC roles. Since Sentinel is a cloud-native enterprise tool and usually used inside organizations, I am unsure how individuals without company access are expected to gain real hands-on experience. I would like to hear from professionals who actively use Sentinel: - How do freshers typically learn and practice Sentinel? - What learning resources or environments are commonly used by beginners? - What level of hands-on experience is realistically expected at entry level? I am looking for guidance based on real industry practice. Thank you for your time.289Views0likes2CommentsMissing details in Azure Activity Logs – MICROSOFT.SECURITYINSIGHTS/ENTITIES/ACTION
The Azure Activity Logs are crucial for tracking access and actions within Sentinel. However, I’m encountering a significant lack of documentation and clarity regarding some specific operation types. Resources consulted: https://learn.microsoft.com/en-us/azure/sentinel/audit-sentinel-data https://learn.microsoft.com/en-us/rest/api/securityinsights/entities?view=rest-securityinsights-2024-01-01-preview https://learn.microsoft.com/en-us/rest/api/securityinsights/operations/list?view=rest-securityinsights-2024-09-01&tabs=HTTP My issue: I observed unauthorized activity on our Sentinel workspace. The Azure Activity Logs clearly indicate the user involved, the resource, and the operation type: "MICROSOFT.SECURITYINSIGHTS/ENTITIES/ACTION" But that’s it. No detail about what the action was, what entity it targeted, or how it was triggered. This makes auditing extremely difficult. It's clear the person was in Sentinel and perform an activity through it, from search, KQL, logs to find an entity from a KQL query. But, that's all... Strangely, this operation is not even listed in the official Sentinel Operations documentation linked above. My question: Has anyone encountered this and found a way to interpret this operation type properly? Any insight into how to retrieve more meaningful details (action context, target entity, etc.) from these events would be greatly appreciated.286Views0likes3CommentsStuck looking up a watchlist value
Hiya, I get stuck working with watchlists sometimes. In this example, I'm wanting to focus on account activity from a list of UPNs. If I split the elements up, I get the individual results, but can't seem to pull it all together. ===================================================== In its entirety, the query returns zero results: let ServiceAccounts=(_GetWatchlist('ServiceAccounts_Monitoring'))| project SearchKey; let OpName = dynamic(['Reset password (self-service)','Reset User Password','Change user password','User reset password','User started password reset','Enable Account','Change password (self-service)','Update PasswordProfile','Self-service password reset flow activity progress']); AuditLogs | where OperationName has_any (OpName) | extend upn = TargetResources.[0].userPrincipalName | where upn in (ServiceAccounts) //<=This is where I think I'm wrong | project upn ===================================================== This line on its own, returns the user on the list: let ServiceAccounts=(_GetWatchlist('ServiceAccounts_Monitoring'))| project SearchKey; ===================================================== This section on its own, returns all the activity let OpName = dynamic(['Reset password (self-service)','Reset User Password','Change user password','User reset password','User started password reset','Enable Account','Change password (self-service)','Update PasswordProfile','Self-service password reset flow activity progress']); AuditLogs | where OperationName has_any (OpName) | extend upn = TargetResources.[0].userPrincipalName | where upn contains "username" //This is the name on the watchlistlist - so I know the activity exists) ==================================================== I'm doing something wrong when I'm trying to use the watchlist cache (I think) Any help\guidance or wisdom would be greatly appreciated! Many thanksSolved92Views0likes2CommentsI'm stuck!
Logically, I'm not sure how\if I can do this. I want to monitor for EntraID Group additions - I can get this to work for a single entry using this: AuditLogs | where TimeGenerated > ago(7d) | where OperationName == "Add member to group" | where TargetResources[0].type == "User" | extend GroupName = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue))) | where GroupName == "NameOfGroup" <-- This returns the single entry | extend User = tostring(TargetResources[0].userPrincipalName) | summarize ['Count of Users Added']=dcount(User), ['List of Users Added']=make_set(User) by GroupName | sort by GroupName asc However, I have a list of 20 Priv groups that I need to monitor. I can do this using: let PrivGroups = dynamic[('name1','name2','name3'}); and then call that like this: blahblah | where TargetResources[0].type == "User" | extend GroupName = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue))) | where GroupName has_any (PrivGroup) But that's a bit dirty to update - I wanted to call a watchlist. I've tried defining with: let PrivGroup = (_GetWatchlist('TestList')); and tried calling like: blahblah | where TargetResources[0].type == "User" | extend GroupName = tostring(parse_json(tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue))) | where GroupName has_any ('PrivGroup') I've tried dropping the let and attempted to lookup the watchlist directly: | where GroupName has_any (_GetWatchlist('TestList')) The query runs but doesn't return any results (Obvs I know the result exists) - How do I lookup that extracted value on a Watchlist. Any ideas or pointers why I'm wrong would be appreciated! Many thanksSolved243Views0likes3CommentsRSAC 2026: What the Sentinel Playbook Generator actually means for SOC automation
RSAC 2026 brought a wave of Sentinel announcements, but the one I keep coming back to is the playbook generator. Not because it's the flashiest, but because it touches something that's been a real operational pain point for years: the gap between what SOC teams need to automate and what they can realistically build and maintain. I want to unpack what this actually changes from an operational perspective, because I think the implications go further than "you can now vibe-code a playbook." The problem it solves If you've built and maintained Logic Apps playbooks in Sentinel at any scale, you know the friction. You need a connector for every integration. If there isn't one, you're writing custom HTTP actions with authentication handling, pagination, error handling - all inside a visual designer that wasn't built for complex branching logic. Debugging is painful. Version control is an afterthought. And when something breaks at 2am, the person on call needs to understand both the Logic Apps runtime AND the security workflow to fix it. The result in most environments I've seen: teams build a handful of playbooks for the obvious use cases (isolate host, disable account, post to Teams) and then stop. The long tail of automation - the enrichment workflows, the cross-tool correlation, the conditional response chains - stays manual because building it is too expensive relative to the time saved. What's actually different now The playbook generator produces Python. Not Logic Apps JSON, not ARM templates - actual Python code with documentation and a visual flowchart. You describe the workflow in natural language, the system proposes a plan, asks clarifying questions, and then generates the code once you approve. The Integration Profile concept is where this gets interesting. Instead of relying on predefined connectors, you define a base URL, auth method, and credentials for any service - and the generator creates dynamic API calls against it. This means you can automate against ServiceNow, Jira, Slack, your internal CMDB, or any REST API without waiting for Microsoft or a partner to ship a connector. The embedded VS Code experience with plan mode and act mode is a deliberate design choice. Plan mode lets you iterate on the workflow before any code is generated. Act mode produces the implementation. You can then validate against real alerts and refine through conversation or direct code edits. This is a meaningful improvement over the "deploy and pray" cycle most of us have with Logic Apps. Where I see the real impact For environments running Sentinel at scale, the playbook generator could unlock the automation long tail I mentioned above. The workflows that were never worth the Logic Apps development effort might now be worth a 15-minute conversation with the generator. Think: enrichment chains that pull context from three different tools before deciding on a response path, or conditional escalation workflows that factor in asset criticality, time of day, and analyst availability. There's also an interesting angle for teams that operate across Microsoft and non-Microsoft tooling. If your SOC uses Sentinel for SIEM but has Palo Alto, CrowdStrike, or other vendors in the stack, the Integration Profile approach means you can build cross-vendor response playbooks without middleware. The questions I'd genuinely like to hear about A few things that aren't clear from the documentation and that I think matter for production use: Security Copilot dependency: The prerequisites require a Security Copilot workspace with EU or US capacity. Someone in the blog comments already flagged this as a potential blocker for organizations that have Sentinel but not Security Copilot. Is this a hard requirement going forward, or will there be a path for Sentinel-only customers? Code lifecycle management: The generated Python runs... where exactly? What's the execution runtime? How do you version control, test, and promote these playbooks across dev/staging/prod? Logic Apps had ARM templates and CI/CD patterns. What's the equivalent here? Integration Profile security: You're storing credentials for potentially every tool in your security stack inside these profiles. What's the credential storage model? Is this backed by Key Vault? How do you rotate credentials without breaking running playbooks? Debugging in production: When a generated playbook fails at 2am, what does the troubleshooting experience look like? Do you get structured logs, execution traces, retry telemetry? Or are you reading Python stack traces? Coexistence with Logic Apps: Most environments won't rip and replace overnight. What's the intended coexistence model between generated Python playbooks and existing Logic Apps automation rules? I'm genuinely optimistic about this direction. Moving from a low-code visual designer to an AI-assisted coding model with transparent, editable output feels like the right architectural bet for where SOC automation needs to go. But the operational details around lifecycle, security, and debugging will determine whether this becomes a production staple or stays a demo-only feature. Would be interested to hear from anyone who's been in the preview - what's the reality like compared to the pitch?Solved206Views0likes1Comment