abhisheksharan
16 TopicsBuild a Local Microsoft Sentinel Triage Agent in VS Code (Copilot + MCP)
Modern SOC work is not limited by data—it’s limited by the friction of collecting it. This post shows a local-first workflow that lets you investigate Microsoft Sentinel incidents from inside VS Code using GitHub Copilot Chat for reasoning and a small, deterministic MCP toolset for evidence retrieval and (optionally) approval-gated writeback. What you’ll take away: How to structure a Copilot + MCP triage loop that stays grounded in Azure evidence A reliability pattern: fall back to KQL when Sentinel subresource APIs are flaky A safety pattern: draft-first, explicit-approval writeback for incident comments Why This Exists Sentinel triage is powerful but fragmented: you jump between the portal, KQL, entity pivots, and case notes just to answer “what happened?” The goal here is to collapse that into a single, repeatable loop inside the editor. Resolve the incident and pull the underlying alerts/entities Pivot into AzureActivity (and other logs) to identify the actor and outcome Use threat intelligence (TI) for context—not as the decision Generate an evidence-backed narrative and draft comment; write back only on explicit approval Design Principles Evidence first: every claim must be traceable to Sentinel APIs or Log Analytics results Small tool surface: fewer tools, clearer prompting, easier hardening Reliability by design: if one API path fails, pivot to KQL and continue Safety boundary: investigation and writeback are separate, and writeback is approval-gated Architecture & Data Flow A local TypeScript MCP server exposes a handful of triage tools to Copilot Chat in VS Code. Reads come from Sentinel + Log Analytics; writes (incident comments) are optional and require explicit approval. Copilot Chat (VS Code) decides the next step and summarizes outputs MCP server executes allowed tools: incident lookup, alert/entity retrieval, KQL queries, optional comment writeback Evidence sources: Sentinel Incident APIs + Log Analytics tables (SecurityIncident, SecurityAlert, AzureActivity, TI tables) Safety gate: writeback happens only after explicit approval; otherwise you get a draft Tool Surface MCP is useful here because it separates reasoning from execution: Copilot can decide what to do, but only the MCP server can do it—and only through tools you explicitly define and can audit. list_incidents / get_incident (ground the case) get_incident_alerts / get_incident_entities (fast path) run_incident_kql (reliable fallback + pivots) add_incident_comment (draft-first; writes only with approval) The Investigation Loop (3 Steps) Prompt used sentinel-triage-local Investigate Sentinel incident 1478 end to end in workspace Subscription ID/Resource Group/Workspace Name. Resolve the incident ID first, collect underlying alerts and entities, enrich with AzureActivity and TI, determine whether the activity is malicious or benign, and return: 1. Investigation summary 2. Key evidence 3. Entity analysis 4. TI enrichment result 5. Risk assessment 6. Recommended disposition 7. Final incident comment draft Rules: - Use tool output only, no guessing. - If alert/entity subresource APIs fail, pivot to KQL and continue. - Do not submit the comment unless I explicitly say: APPROVE COMMENT. 1) Ground the incident Resolve the human-friendly incident number to the Sentinel incident resource ID, then capture the metadata you need to drive every later pivot. Incident numbers are convenient for analysts, but the actual investigation flow depends on the underlying incident resource ID. Resolving that first gives the workflow a concrete anchor for: Title Severity Owner Status Alert count Analytic rule IDs Incident URL This gives you the stable identifiers (and the URL) needed to retrieve alerts, entities, and supporting logs. 2) Collect alerts and entities (fast path) Pull the alerts behind the incident and the entities they reference. When the incident subresource APIs behave, this is the fastest way to assemble the working set. In the ideal path, the agent can call the incident alert and entity subresources directly. That gives fast access to: Alert IDs Alert names Timestamps Severities Entities Provider metadata 3) Stay reliable: pivot to KQL when APIs fail In real environments, the incident subresource APIs for alerts/entities are not always dependable. When they fail, the workflow switches to Log Analytics and reconstructs the same evidence via KQL—so the investigation continues. SecurityIncident to recover the incident record and alert IDs SecurityAlert to retrieve alert details and entities AzureActivity to determine who or what performed the operation ThreatIntelligenceIndicator and ThreatIntelIndicators for enrichment The High-Signal Pivot: AzureActivity In the incidents I tested, AzureActivity was the fastest way to classify “suspicious deployment” alerts: it tells you who did the action, what operation ran, and whether it succeeded. The evidence showed: The caller was a single Microsoft Entra ID object ID Claims_d.idtyp = "app" Authorization_d.evidence.principalType = "ServicePrincipal" The activity was tied to a policy assignment The operation was MICROSOFT.RESOURCES/DEPLOYMENTS/WRITE The result was BadRequest with InvalidTemplate That pattern typically points to automation (service principal + policy-driven deployment) failing due to a bad template—not an interactive attacker. Threat Intelligence: Use It as Context Enrich observables against TI, but treat it as corroboration: a hit is not proof, and a miss is not a clean bill of health. In my test runs, TI mainly helped refine confidence after AzureActivity and alert evidence established the likely story. Output: An Evidence-Backed Narrative (and a Draft Comment) Once the tools return results, Copilot’s job is synthesis: turn structured evidence into a short narrative an analyst can paste into the case. What happened, who/what triggered it, and whether it succeeded Key supporting evidence (alerts, entities, AzureActivity pivots, TI context) A recommended disposition and a draft incident comment Incident comment written back automatically (after approval) (screenshot): Safety + Reliability: Approval-Gated Writeback The agent can draft a comment automatically, but it cannot change incident state unless the analyst explicitly approves. That boundary is what makes the workflow usable in real operations. After approval, the tool submits the drafted comment directly to the Sentinel incident so the portal reflects the same evidence-backed narrative. Default: return the draft comment only On approval: acquire an ARM token via Azure CLI and submit via curl.exe (hardened with validation + retries) Why This Is Worth Building Less context switching: investigation happens where you already work More consistency: the same loop runs every time, with deterministic tools Better classification: AzureActivity pivots reduce false “user did X” assumptions Safer automation: drafts are automatic; writes are explicit and auditable Conclusion AI is most useful in a SOC when it is constrained: deterministic tools fetch the evidence, the model synthesizes it, and humans keep control of state changes. A local Copilot + MCP workflow hits that sweet spot—faster triage for the SOC analysts.Ingesting custom application logs Text/JSON file to Microsoft Sentinel
This blog is in continuation to my previous blog on Demystifying Log Ingestion API where I discussed on ingesting custom log files to Microsoft Sentinel via Log Ingestion API approach. In this blog post I will delve into ingesting custom application logs in Text/JSON format to Microsoft Sentinel. Note: For my demo purposes I will use the log in JSON format. First, lets start with WHY is this important. Many applications and services will log information to a JSON/Text files instead of standard logging services such as Windows Event log or Syslog. There are several use cases where custom application logs are mandatory to be monitored and that’s why this integration becomes crucial part of SOC monitoring. How to implement this integration? Custom application logs in Text/JSON format can be collected with Azure Monitor Agent and stored in a Log Analytics workspace with data collected from other sources. There are two ways to do it: Creating DCR-based custom table and link it with Data Collection Rule and Data Collection Endpoint. Leverage Custom logs via AMA content hub solution. I will discuss both approaches in this blog. Let’s see it in action now. Leveraging DCR-based custom table to ingest custom application logs Using this approach, we will first create a DCR-based custom table and link it with DCR and DCE. Prerequisites for this approach: Log Analytics workspace where you have at least contributor rights. A data collection endpoint (DCE) in the same region as the Log Analytics workspace. See How to set up data collection endpoints based on your deployment for details. Either a new or existing DCR described in Collect data with Azure Monitor Agent. Basic Operations: The following diagram shows the basic operation of collecting log data from a json file. The agent watches for any log files that match a specified name pattern on the local disk. Each entry in the log is collected and sent to Azure Monitor. The incoming stream defined by the user is used to parse the log data into columns. A default transformation is used if the schema of the incoming stream matches the schema of the target table. Detailed steps as follows: Browse to Log Analytics Workspace > Settings > Tables > New custom log (DCR-based) Enter the table name, please note the suffix _CL will be automatically added. Use existing or create a new DCR and link a DCE. Upload the sample log file in JSON format to create table schema In my use case, I’ve created few columns like TimeGenerated, FilePath and Computer using the transformation query mentioned below: source | extend TimeGenerated = todatetime(Time), FilePath = tostring('C:\\Custom Application\\v.1.*.json'), Computer = tostring('DC-WinSrv22') Review and create the table. Go to the Data Collection Rule > Resources and Add the Application Server and link it with the DCE. If all configurations are correct, then in few minutes the data should populate in the custom table as shown below: Note: Ensure that the Application Server is reporting to the correct Log Analytics Workspace and the DCR, DCE are linked to the Server. Details of DCRs associated with a VM can be fetched from the following PowerShell Script: Get-AzDatacollectionRuleAssociation -TargetResourceId {ResourceID} Please note, 'Custom JSON log' data source configuration is currently unavailable through the Portal, you can use Azure CLI or ARM template for the configurations. However, ‘Custom Text Logs’ data source can be configured from Azure Portal (DCR>Data Sources) Leveraging Custom logs via AMA Data Connector We’ve recently released a content hub solution for ingesting custom logs via AMA. This approach is straightforward as the required columns like TimeGenerated and RawData gets created automatically. Detailed steps as follows: Browse to Microsoft Sentinel > Content Hub > Custom Logs AMA and install this solution Go to Manage > Open the connector page > Create Data Collection Rule Enter the Rule name, target VM and specify if you wish to create a new table. If so, provide a table name. You’ll also need to provide the file pattern (wildcards are supported) along with transformation logic (if applicable). In my use case, I am not using any transformation. Once DCR is created, wait for some time and validate if logs are streaming or not. If all the configurations are correct, then you’ll see the logs in the table as shown below: Please note, since we have used DCR-based custom tables we can switch the table plan to Basic if needed. Additionally, DCR-based custom tables support transformation so irrelevant data can be dropped or the incoming data can be split to multiple tables. References: Collect logs from a JSON file with Azure Monitor Agent - Azure Monitor | Microsoft Learn Collect logs from text files with the Azure Monitor Agent and ingest to Microsoft Sentinel - AMA | Microsoft Learn Demystifying Log Ingestion API | Microsoft Community Hub Save ingestion costs by splitting logs into multiple tables and opting for the basic tier! | Microsoft Community Hub Workspace & DCR Transformation Simplified | Microsoft Community HubBreak the 30,000 Rows Limit with Advanced Hunting API!
In this blog post, I will explain how to utilize advanced hunting APIs to bypass the 30,000 rows limit in Defender XDR's advanced hunting feature. Before we delve into the topic, let’s understand what is an Advanced Hunting in Defender XDR and what problem we are trying to solve. Advanced Hunting in Defender XDR (Extended Detection and Response) is a powerful feature in Microsoft Defender that allows security professionals to query and analyse large volumes of raw data to uncover potential threats across an organization's environment. It provides a flexible query interface where users can write custom queries using Kusto Query Language (KQL) to search through data collected from various sources, such as endpoints, emails, cloud apps, and more. Key features of Advanced Hunting in Defender XDR include: Custom Queries: You can create complex queries to search for specific activities, patterns, or anomalies across different security data sources. Deep Data Analysis: It allows for deep analysis of raw data, going beyond the pre-defined alerts and detections to identify potential threats, vulnerabilities, or suspicious behaviours that might not be immediately visible. Cross-Platform Search: Advanced Hunting enables users to query across a wide range of data sources, including Microsoft Defender for Endpoint, Defender for Identity, Defender for Office 365, and Defender for Cloud Apps. Automated Response: It supports creating automated response actions based on the findings of advanced hunts, helping to quickly mitigate threats. Integration with Threat Intelligence: You can enrich your hunting queries with external threat intelligence to correlate indicators of compromise (IOCs) and identify malicious activities. Visualizations and Insights: Results from hunting queries can be visualized to help spot trends and patterns, making it easier to investigate and understand the data. Advanced Hunting is a valuable tool for proactive threat detection, investigation, and response within Defender XDR, giving security teams more flexibility and control over the security posture of their organization. Advanced Hunting quotas and service limits To keep the service performant and responsive, advanced hunting sets various quotas and usage parameters (also known as "service limits"). By design, each Advanced Hunting query can fetch up to 30,000 rows. Refer our public documentation for more information about the service limitations in Advanced Hunting. In this blog, we will focus on leveraging Advanced Hunting APIs to bypass the 30,000 rows service limit of Advanced Hunting. Usually when the query result exceeds 30,000 rows it’s recommended to: Try refining/optimizing the query by introducing filters to separate it into distinct segments, and then merge the results into a comprehensive report. Leverage Advanced Hunting API as it can fetch up to 100,000 rows: Advanced Hunting API - Microsoft Defender for Endpoint | Microsoft Learn We’re going to focus on the second approach here. Let's dive deeper into the process of fetching up to 100,000 records using the Advanced Hunting API. Login to Microsoft Defender XDR (https://security.microsoft.com/) Browse to Endpoints > Partners and APIs > API Explorer Submit a POST query along with the JSON with the Advanced Hunting query. POST https://api.securitycenter.microsoft.com/api/advancedqueries/run Let’s take an example of an AH query to fetch details about devices with open CVEs details. Sample Advanced Hunting query: DeviceTvmSoftwareVulnerabilities | join kind=inner ( DeviceTvmSoftwareVulnerabilitiesKB | extend CveId = tostring(CveId) // Cast CveId to string in the second leg of the join | project CveId, VulnerabilitySeverityLevel, CvssScore, PublishedDate, VulnerabilityDescription ) on CveId | project DeviceName, OSPlatform, OSVersion, CveId, VulnerabilitySeverityLevel, CvssScore, PublishedDate, VulnerabilityDescription, RecommendedSecurityUpdate Note: The advanced hunting query in the JSON template should be written in a single line. Let’s see it in action now. My JSON template is as follows: { "Query":"DeviceTvmSoftwareVulnerabilities| join kind=inner (DeviceTvmSoftwareVulnerabilitiesKB | extend CveId = tostring(CveId) | project CveId, VulnerabilitySeverityLevel, CvssScore, PublishedDate, VulnerabilityDescription) on CveId | project DeviceName, OSPlatform, OSVersion, CveId, VulnerabilitySeverityLevel, CvssScore, PublishedDate, VulnerabilityDescription, RecommendedSecurityUpdate" } Execute the query and it returns a response (as shown below) Copy the response; save it as a JSON file locally Use PowerShell to convert JSON to CSV format. For Ex: Following PowerShell script can be used to convert the JSON file to CSV report: Get-Content "<Location of JSON file>" | ConvertFrom-Json | select -Expand Results | ConvertTo-Csv -NoTypeInformation | Out-File "<Location to save CSV file>" -Encoding ASCII The CSV report should have up to 100,000 records. I would also recommend going through the limitations of Advanced Hunting APIs as well: Advanced Hunting API - Microsoft Defender for Endpoint | Microsoft Learn References: Advanced Hunting APIs: Advanced Hunting API - Microsoft Defender for Endpoint | Microsoft Learn Advanced Hunting Overview: Overview - Advanced hunting - Microsoft Defender XDR | Microsoft LearnDetecting and Alerting on MDE Sensor Health Transitions Using KQL and Logic Apps
Introduction Maintaining the health of Microsoft Defender for Endpoint (MDE) sensors is essential for ensuring continuous security visibility across your virtual machine (VM) infrastructure. When a sensor transitions from an "Active" to an "Inactive" state, it indicates a loss of telemetry from that device and potentially creating blind spots in your security posture. To proactively address this risk, it's important to detect these transitions promptly and alert your security team for timely remediation. This guide walks you through a practical approach to automate this process using a Kusto Query Language (KQL) script to identify sensor health state changes, and an Azure Logic App to trigger email alerts. By the end, you'll have a fully functional, automated monitoring solution that enhances your security operations with minimal manual effort. Why Monitoring MDE Sensor Health Transitions is Important Ensures Continuous Security Visibility MDE sensors provide critical telemetry data from endpoints. If a sensor becomes inactive, that device stops reporting, creating a blind spot in your security monitoring. Prevents Delayed Threat Detection Inactive sensors can delay the identification of malicious activity, giving attackers more time to operate undetected within your environment. Supports Effective Incident Response Without telemetry, incident investigations become harder and slower, reducing your ability to respond quickly and accurately to threats. Identifies Root Causes Early Monitoring transitions helps uncover underlying issues such as service disruptions, misconfigurations, or agent failures that may otherwise go unnoticed. Closes Security Gaps Proactively Early detection of inactive sensors allows teams to take corrective action before adversaries exploit the lapse in coverage. Enables Automation and Scalability Using KQL and Logic Apps automates the detection and alerting process, reducing manual effort and ensuring consistent monitoring across large environments. Improves Operational Efficiency Automated alerts reduce the need for manual checks, freeing up security teams to focus on higher-priority tasks. Strengthens Overall Security Posture Proactive monitoring and fast remediation contribute to a more resilient and secure infrastructure. Prerequisites MDE Enabled: Defender for Endpoint must be active and reporting on all relevant devices. Stream DeviceInfo table (from Defender XDR connector) in Microsoft Sentinel’s workspace: Required to run KQL queries and manage alerts. Log Analytics Workspace: To run the KQL query. Azure Subscription: Needed to create and manage Logic Apps. Permissions: Sufficient RBAC access to Logic Apps, Log Analytics, and email connectors. Email Connector Setup: Outlook, SendGrid, or similar must be configured in Logic Apps. Basic Knowledge: Familiarity with KQL and Logic App workflows is helpful. High-level summary of the Logic Apps flow for monitoring MDE sensor health transitions: Trigger: Recurrence The Logic App starts on a scheduled basis (e.g., weekly or daily or hourly) using a recurrence trigger. Action: Run KQL Query Executes a Kusto Query against the Log Analytics workspace to detect devices where the MDE sensor transitioned from Active to Inactive in the last 7 days. Condition (Optional): Check for Results Optionally checks if the query returned any results to avoid sending empty alerts. Action: Send Email Notification If results are found, an email is sent to the security team with details of the affected devices using dynamic content from the query output. Logic Apps Flow KQL Query to Detect Sensor Transitions Use the following KQL query in Microsoft Defender XDR or Microsoft Sentinel to identify VMs where the sensor health state changed from Active to Inactive in the last 7 days: let LookbackPeriod = 7d; let NeverActiveDevice = DeviceInfo | where TimeGenerated > ago(LookbackPeriod) | where OnboardingStatus == "Onboarded" | project DeviceName, DeviceId, TimeGenerated, SensorHealthState | summarize make_set(SensorHealthState) by DeviceId | where not(set_has_element(set_SensorHealthState, "Active")) | lookup kind=inner (DeviceInfo | project DeviceName, DeviceId, TimeGenerated, SensorHealthState) on DeviceId | summarize arg_max(TimeGenerated, *) by DeviceId; let PreviousActiveDevices = DeviceInfo | project DeviceName, DeviceId, TimeGenerated, SensorHealthState, OnboardingStatus | where TimeGenerated > ago(LookbackPeriod) | summarize arg_max(TimeGenerated, *) by DeviceId, SensorHealthState | sort by DeviceId asc, TimeGenerated asc | serialize | extend PrevState = prev(SensorHealthState) | extend PrevState_deviceId = prev(DeviceId) | where DeviceId == PrevState_deviceId | where PrevState == "Active" and SensorHealthState != "Active" | extend DaysInactive = datetime_diff('day', now(), TimeGenerated); union PreviousActiveDevices, NeverActiveDevice | project-reorder TimeGenerated, DeviceId, DeviceName, PrevState, SensorHealthState, DaysInactive | extend DaysInactive = iff(isnotempty(DaysInactive), tostring(DaysInactive), strcat("Not Active in ", toint(LookbackPeriod/1d), " days")) This KQL query does the following: Detects devices whose sensors have stopped functioning (changed from Active to Inactive) in the past 7 days. Provides the first time this happened for each affected device. It also tells you how long each device has been inactive. Sample Email for reference How This Helps the Security Team Maintains Endpoint Visibility Detects when devices stop reporting telemetry, helping prevent blind spots in threat detection. Enables Proactive Threat Management Identifies sensor health issues before they become security incidents, allowing early intervention. Reduces Manual Monitoring Effort Automates the detection and alerting process, freeing up analysts to focus on higher-priority tasks. Improves Incident Response Readiness Ensures all endpoints are actively monitored, which is critical for timely and accurate incident investigations. Supports Compliance and Audit Readiness Demonstrates continuous monitoring and control over endpoint health, which is often required for regulatory compliance. Prioritizes Remediation Efforts Provides a clear list of affected devices, helping teams focus on the most recent or longest inactive endpoints. Integrates with Existing Workflows Can be extended to trigger ticketing systems, remediation scripts, or SIEM alerts, enhancing operational efficiency. Conclusion By combining KQL analytics with Azure Logic Apps, you can automate the detection and notification of sensor health issues in your VM fleet, ensuring continuous security coverage and rapid response to potential risks.