alerts
367 TopicsIngesting Google Cloud Logs into Microsoft Sentinel: Native vs. Custom Architectures
Overview of GCP Log Types and SOC Value Modern Security Operations Centers (SOCs) require visibility into key Google Cloud Platform logs to detect threats and suspicious activities. The main log types include: GCP Audit Logs – These encompass Admin Activity, Data Access, and Access Transparency logs for GCP services. They record every administrative action (resource creation, modification, IAM changes, etc.) and access to sensitive data, providing a trail of who did what and when in the cloud. In a SOC context, audit logs help identify unauthorized changes or anomalous admin behavior (e.g. an attacker creating a new service account or disabling logging). They are essential for compliance and forensics, as they detail changes to configurations and access patterns across GCP resources. VPC Flow Logs – These logs capture network traffic flow information at the Virtual Private Cloud (VPC) level. Each entry typically includes source/destination IPs, ports, protocol, bytes, and an allow/deny action. In a SOC, VPC flow logs are invaluable for network threat detection: they allow analysts to monitor access patterns, detect port scanning, identify unusual internal traffic, and profile ingress/egress traffic for anomalies. For example, a surge in outbound traffic to an unknown IP or lateral movement between VMs can be spotted via flow logs. They also aid in investigating data exfiltration and verifying network policy enforcement. Cloud DNS Logs – Google Cloud DNS query logs record DNS requests/responses from resources, and DNS audit logs record changes to DNS configurations. DNS query logs are extremely useful in threat hunting because they can reveal systems resolving malicious domain names (C2 servers, phishing sites, DGA domains) or performing unusual lookups. Many malware campaigns rely on DNS; having these logs in Sentinel enables detection of known bad domains and anomalous DNS traffic patterns. DNS audit logs, on the other hand, track modifications to DNS records (e.g. newly added subdomains or changes in IP mappings), which can indicate misconfigurations or potential domain hijacking attempts. Together, these GCP logs provide comprehensive coverage: audit logs tell what actions were taken in the cloud, while VPC and DNS logs tell what network activities are happening. Ingesting all three into Sentinel gives a cloud security architect visibility to detect unauthorized access, network intrusions, and malware communication in a GCP environment. Native Microsoft Sentinel GCP Connector: Architecture & Setup Microsoft Sentinel offers native data connectors to ingest Google Cloud logs, leveraging Google’s Pub/Sub messaging for scalable, secure integration. The native solution is built on Sentinel’s Codeless Connector Framework (CCF) and uses a pull-based architecture: GCP exports logs to Pub/Sub, and Sentinel’s connector pulls from Pub/Sub into Azure. This approach avoids custom code and uses cloud-native services on both sides. Supported GCP Log Connectors: Out of the box, Sentinel provides connectors for: GCP Audit Logs – Ingests the Cloud Audit Logs (admin activity, data access, transparency). GCP Security Command Center (SCC) – Ingests security findings from Google SCC for threat and vulnerability management. GCP VPC Flow Logs – (Recently added) Ingests VPC network flow logs. GCP Cloud DNS Logs – (Recently added) Ingests Cloud DNS query logs and DNS audit logs. Others – Additional connectors exist for specific GCP services (Cloud Load Balancer logs, Cloud CDN, Cloud IDS, GKE, IAM activity, etc.) via CCF, expanding the coverage of GCP telemetry in Sentinel. Each connector typically writes to its own Log Analytics table (e.g. GCPAuditLogs, GCPVPCFlow, GCPDNS, etc.) and comes with built-in KQL parsers. Architecture & Authentication: The native connector uses Google Pub/Sub as the pipeline for log delivery. On the Google side, you will set up a Pub/Sub Topic that receives the logs (via Cloud Logging exports), and Sentinel will subscribe to that topic. Authentication is handled through Workload Identity Federation (WIF) using OpenID Connect: instead of managing static credentials, you establish a trust between Azure AD and GCP so that Sentinel can impersonate a GCP service account. The high-level architecture is: GCP Cloud Logging (logs from services) → Log Router (export sink) → Pub/Sub Topic → (Secure pull over OIDC) → Azure Sentinel Data Connector → Log Analytics Workspace. This ensures a secure, keyless integration. The Azure side (Sentinel) authenticates as a Google service account via OIDC tokens issued by Azure AD, which GCP trusts through the Workload Identity Provider. Below are the detailed setup steps: GCP Setup (Publishing Logs to Pub/Sub) Enable Required APIs: Ensure the GCP project hosting the logs has the IAM API and Cloud Resource Manager API enabled (needed for creating identity pools and roles). You’ll also need owner/editor access on the project to create the resources below. Create a Workload Identity Pool & Provider: In Google Cloud IAM, create a new Workload Identity Pool (e.g. named “Azure-Sentinel-Pool”) and then a Workload Identity Provider within that pool that trusts your Azure AD tenant. Google provides a template for Azure AD OIDC trust – you’ll supply your Azure tenant ID and the audience and issuer URIs that Azure uses. For Azure Commercial, the issuer is typically https://sts.windows.net/<TenantID>/ and audience api://<some-guid> as documented. (Microsoft’s documentation or Terraform scripts provide these values for the Sentinel connector.) Create a Service Account for Sentinel: Still in GCP, create a dedicated service account (e.g. email address removed for privacy reasons). This account will be what Sentinel “impersonates” via the WIF trust. Grant this service account two key roles: Pub/Sub Subscriber on the Pub/Sub Subscription that will be created (allows pulling messages). You can grant roles/pubsub.subscriber at the project level or on the specific subscription. Workload Identity User on the Workload Identity Pool. In the pool’s permissions, add a principal of the form principalSet://iam.googleapis.com/projects/<WIF_project_number>/locations/global/workloadIdentityPools/<Pool_ID>/* and grant it the role roles/iam.workloadIdentityUser on your service account. This allows the Azure AD identity to impersonate the GCP service account. Note: GCP best practice is often to keep the identity pool in a centralized project and service accounts in separate projects, but as of late 2023 the Sentinel connector UI expected them in one project (a limitation under review). It’s simplest to create the WIF pool/provider and the service account within the same GCP project to avoid connectivity issues (unless documentation confirms support for cross-project). Create a Pub/Sub Topic and Subscription: Open the GCP Pub/Sub service and create a Topic (for example, projects/yourproject/topics/sentinel-logs). It’s convenient to dedicate one topic per log type or use case (e.g. one for audit logs). As you create the topic, you can add a Subscription to it in Pull mode (since Sentinel will pull messages). Use a default subscription with an appropriate name (e.g. sentinel-audit-sub). You can leave the default settings (ack deadline, retention) as is, or extend retention if you want messages to persist longer in case of downtime (default is 7 days). Create a Logging Export (Sink): In GCP Cloud Logging, set up a Log Sink to route the desired logs into the Pub/Sub topic. Go to Logging > Logs Router and create a sink: Give it a descriptive name (e.g. audit-logs-to-sentinel). Choose Cloud Pub/Sub as the Destination and select the topic you created (or use the format pubsub.googleapis.com/projects/yourproject/topics/sentinel-logs). Scope and filters: Decide which logs to include. For Audit Logs, you might include ALL audit logs in the project (the sink can be set to include admin activity, data access, etc., by default for the whole project or even entire organization if created at org level). For other log types like VPC Flow Logs or DNS, you’d set an inclusion filter for those specific log names (e.g. logName:"compute.googleapis.com/vpc_flows" to capture VPC Flow Logs). You can also create organization-level sinks to aggregate logs from all projects. Permissions: When creating a sink, GCP will ask to grant the sink service account publish rights to the Pub/Sub topic. Accept this so logs can flow. Once the sink is created, verify logs are flowing: in Pub/Sub > Subscriptions, you can “Pull” messages manually to see if any logs appear. Generating a test event (e.g., create a VM to produce an audit log, or make a DNS query) can help confirm. At this point, GCP is set up to export logs. All requisite GCP resources (IAM federation, service account, topic/subscription, sink) are ready. Google also provides Terraform scripts (and Microsoft supplies Terraform templates in their GitHub) to automate these steps. Using Terraform, you can stand up the IAM and Pub/Sub configuration quickly if comfortable with code. Azure Sentinel Setup (Connecting the GCP Connector) With GCP publishing logs to Pub/Sub, configure Sentinel to start pulling them: Install the GCP Solution: In the Azure portal, navigate to your Sentinel workspace. Under Content Hub (or Data Connectors), find Google Cloud Platform Audit Logs (for example) and click Install. This deploys any needed artifacts (the connector definition, parser, etc.). Repeat for other GCP solutions (like GCP VPC Flow Logs or GCP DNS) as needed. Open Data Connector Configuration: After installation, go to Data Connectors in Sentinel, search for “GCP Pub/Sub Audit Logs” (or the relevant connector), and select it. Click Open connector page. In the connector page, click + Add new (or Add new collector) to configure a new connection instance. Enter GCP Parameters: A pane will prompt for details to connect: you need to supply the Project ID (of the GCP project where the Pub/Sub lives), the Project Number, the Topic and Subscription name, and the Service Account Email you created. You’ll also enter the Workload Identity Provider ID (the identifier of the WIF provider, usually in format projects/<proj>/locations/global/workloadIdentityPools/<pool>/providers/<provider>). All these values correspond to the GCP resources set up earlier – the UI screenshot in docs shows sample placeholders. Make sure there are no typos (a common error is mixing up project ID (name) with project number, or using the wrong Tenant ID). Data Collection Rule (DCR): The connector may also ask for a Data Collection Rule (DCR) and DCE (Data Collection Endpoint) names. Newer connectors based on CCF use the Log Ingestion API, so behind the scenes a DCR is used. If required, provide a name (the docs often suggest prefixing with “Microsoft-Sentinel-” e.g., Microsoft-Sentinel-GCPAuditLogs-DCR). The system will create the DCR and a DCE for you if not already created. (If you installed via Content Hub, this is often automated – just ensure the names follow any expected pattern.) Connect: Click Connect. Sentinel will attempt to use the provided info to establish connectivity. It performs checks like verifying the subscription exists and that the service account can authenticate. If everything is set up properly, the connector will connect and start streaming data. In case of an error, you’ll receive a detailed message. For example, an error about WIF Pool ID not found or subscription not found indicates an issue in the provided IDs or permissions. Double-check those values if so. Validation: After ~15-30 minutes, verify that logs are arriving. You can run a Log Analytics query on the new table, for example: GCPAuditLogs | take 5 (for audit logs) or GCPVPCFlow | take 5 for flow logs. You should see records if ingestion succeeded. Sentinel also provides a “Data connector health” feature – enable it to get alerts or status on data latency and volume for this connector. Data Flow and Ingestion: Once connected, the system works continuously and serverlessly: GCP Log Router pushes new log entries to Pub/Sub as they occur. The Sentinel connector (running in Azure’s cloud) polls the Pub/Sub subscription. It uses the service account credentials (via OIDC token) to call the Pub/Sub API and pull messages in batches. This happens at a defined interval (typically very frequently, e.g. every few seconds). Each message (which contains a log entry in JSON) is then ingested into Log Analytics. The CCF connector uses the Log Ingestion API on the backend, mapping the JSON fields to the appropriate table schema. The logs appear in the respective table (with columns for each JSON field or a dynamic JSON column, depending on the connector design). Sentinel’s built-in parser or Normalized Schemas (ASIM) can be used to query these logs in a friendly way. For instance, the Audit Logs solution includes KQL functions to parse out common fields like user, method, status, etc., from the raw JSON. This native pipeline is fully managed – you don’t have to run any servers or code. The use of Pub/Sub and OIDC makes it scalable and secure by design. Design Considerations & Best Practices for the Native Connector: Scalability & Performance: The native connector approach is designed for high scale. Pub/Sub itself can handle very large log volumes with low latency. The Sentinel CCF connectors use a SaaS, auto-scaling model – no fixed infrastructure means they will scale out as needed to ingest bursts of data. This is a significant advantage over custom scripts or function apps which might need manual scaling. In testing, the native connector can reliably ingest millions of log events per day. If you anticipate extremely high volumes (e.g. VPC flow logs from hundreds of VMs), monitor the connector’s performance but it should scale automatically. Reliability: By leveraging Pub/Sub’s at-least-once delivery, the integration is robust. Even if Azure or the connector has a transient outage, log messages will buffer in Pub/Sub. Once the connector resumes, it will catch up on the backlog. Ensure the subscription’s message retention is adequate (the default 7 days is usually fine). The connector acknowledges messages only after they’re ingested into Log Analytics, which prevents data loss on failures. This reliability is achieved without custom code – reducing the chance of bugs. Still, it’s good practice to use Sentinel’s connector health metrics to catch any issues (e.g., if the connector hasn’t pulled data in X minutes, indicating a problem). Security: The elimination of persistent credentials is a best practice. By using Workload Identity Federation, the Azure connector obtains short-lived tokens to act as the GCP service account. There is no need to store a GCP service account key file, which could be a leak risk. Ensure that the service account has the minimal roles needed. Typically, it does not need broad viewer roles on all GCP resources – it just needs Pub/Sub subscription access (and logging viewer only if you choose to restrict log export by IAM – usually not necessary since the logs are already exported via the sink). Keep the Azure AD application’s access limited too: the Azure AD app (which underpins the Sentinel connector) only needs to access the Sentinel workspace and doesn’t need rights in GCP – the trust is handled by the WIF provider. Filtering and Log Volume Management: A common best practice is to filter GCP logs at the sink to avoid ingesting superfluous data. For instance, if only certain audit log categories are of interest (e.g., Admin Activity and security-related Data Access), you could exclude noisy Data Access logs like storage object reads. For VPC Flow Logs, you might filter on specific subnetworks or even specific metadata (though typically you’d ingest all flows and use Sentinel for filtering). Google’s sink filters allow you to use boolean expressions on log fields. The community recommendation for Firewall or VPC logs, for example, is to set a filter so that only those logs go into that subscription. This reduces cost and noise in Sentinel. Plan your log sinks carefully: you may create multiple sinks if you want to separate log types (one sink to Pub/Sub for audit logs, another sink (with its own topic) for flow logs, etc.). The Sentinel connectors are each tied to one subscription and one table, so separating by log type can help manage parsing and access. Coverage Gaps: Check what the native connectors support as of the current date. Microsoft has been rapidly adding GCP connectors (VPC Flow Logs and DNS logs were in Preview in mid-2025 and are likely GA by now). If a needed log type is not supported (for example, if you have a custom application writing logs to Google Cloud Logging), you might consider the custom ingestion approach (see next section). For most standard infrastructure logs, the native route is available and preferable. Monitoring and Troubleshooting: Familiarize yourself with the connector’s status in Azure. In the Sentinel UI, each configured GCP connector instance will show a status (Connected/Warning/Error) and possibly last received timestamp. If there’s an issue, gather details from error messages there. On GCP, monitor the Pub/Sub subscription: pubsub subscriptions list --filter="name:sentinel-audit-sub" can show if there’s a growing backlog (unacked message count). A healthy system should have near-zero backlog with steady consumption. If backlog is growing, it means the connector isn’t keeping up or isn’t pulling – check Azure side for throttling or errors. Multi-project or Org-wide ingestion: If your organization has many GCP projects, you have options. You could deploy a connector per project, or use an organization-level log sink in GCP to funnel logs from all projects into a single Pub/Sub. The Sentinel connector can pull organization-wide if the service account has rights (the Terraform script allows an org sink by providing an organization-id). This centralizes management but be mindful of very large volumes. Also, ensure the service account has visibility on those logs (usually not an issue if they’re exported; the sink’s own service account handles the export). In summary, the native GCP connector provides a straightforward and robust way to get Google Cloud logs into Sentinel. It’s the recommended approach for supported log types due to its minimal maintenance and tight integration. Custom Ingestion Architecture (Pub/Sub to Azure Event Hub, etc.) In cases where the built-in connector doesn’t meet requirements – e.g., unsupported log types, custom formats, or corporate policy to use an intermediary – you can design a custom ingestion pipeline. The goal of custom architectures is the same (move logs from GCP to Sentinel) but you can incorporate additional processing or routing. One reference pattern is: GCP Pub/Sub → Azure Event Hub → Sentinel, which we’ll use as an example among other alternatives. GCP Export (Source): This part remains the same as the native setup – you create log sinks in GCP to continuously export logs to Pub/Sub topics. You can reuse what you’ve set up or create new, dedicated Pub/Sub topics for the custom pipeline. For instance, you might have a sink for Cloud DNS query logs if the native connector wasn’t used, sending those logs to a topic. Ensure you also create subscriptions on those topics for your custom pipeline to pull from. If you plan to use a GCP-based function to forward data, a Push subscription could be used instead (which can directly call an HTTP endpoint), but a Pull model is more common for custom solutions. Bridge / Transfer Component: This is the core of the custom pipeline – a piece of code that reads from Pub/Sub and sends data to Azure. Several implementation options: Google Cloud Function or Cloud Run (in GCP): You can deploy a Cloud Function that triggers on new Pub/Sub messages (using Google’s EventArc or a Pub/Sub trigger). This function will execute with the message as input. Inside the function, you would parse the Pub/Sub message and then forward it to Azure. This approach keeps the “pull” logic on the GCP side – effectively GCP pushes to Azure. For example, a Cloud Function (Python) could be subscribed to the sentinel-logs topic; each time a log message arrives, the function runs, authenticates to Azure, and calls the ingestion API. Cloud Functions can scale out automatically based on the message volume. Custom Puller in Azure (Function App or Container): Instead of running the bridging code in GCP, you can run it in Azure. For instance, an Azure Function with a timer trigger (running every minute) or an infinite-loop container in Azure Kubernetes Service could use Google’s Pub/Sub client library to pull messages from GCP. You would provide it the service account credentials (likely a JSON key) to authenticate to the Pub/Sub pull API. After pulling a batch of messages, it would send them to the Log Analytics workspace. This approach centralizes everything in Azure but requires managing GCP credentials securely in Azure. Using Google Cloud Dataflow (Apache Beam): For a heavy-duty streaming solution, you could write an Apache Beam pipeline that reads from Pub/Sub and writes to an HTTP endpoint (Azure). Google Dataflow runs Beam pipelines in a fully managed way and can handle very large scale with exactly-once processing. However, this is a complex approach unless you already use Beam – it’s likely overkill for most cases and involves significant development. No matter which method, the bridge component must handle reading, transforming, and forwarding logs efficiently. Destination in Azure: There are two primary ways to ingest the data into Sentinel: Azure Log Ingestion API (via DCR) – This is the modern method (introduced in 2022) to send custom data to Log Analytics. You’ll create a Data Collection Endpoint (DCE) in Azure and a Data Collection Rule (DCR) that defines how incoming data is routed to a Log Analytics table. For example, you might create a custom table GCP_Custom_Logs_CL for your logs. Your bridge component will call the Log Ingestion REST API endpoint (which is a URL associated with the DCE) and include a shared access signature or Azure AD token for auth. The payload will be the log records (in JSON) and the DCR rule ID to apply. The DCR can also perform transformations if needed (e.g., mappings of fields). This API call will insert the data into Log Analytics in real-time. This approach is quite direct and is the recommended custom ingestion method (it replaces the older HTTP Data Collector API). Azure Event Hub + Sentinel Connector – In this approach, instead of pushing directly into Log Analytics, you use an Event Hub as an intermediate buffer in Azure. Your GCP bridge will act as a producer, sending each log message to an Event Hub (over AMQP or using Azure’s SDK). Then, you need something to get data from Event Hub into Sentinel. There are a couple of options: Historically, Sentinel provided an Event Hub data connector (often used for Azure Activity logs or custom CEF logs). This connector can pull events from an Event Hub and write to Log Analytics. However, it typically expects the events to be in a specific format (like CEF or JSON with a known structure). If your logs are raw JSON, you might need to wrap them or use a compatible format. This method is somewhat less flexible unless you tailor your output to what Sentinel expects. Alternatively (and more flexibly), you can write a small Azure Function that triggers on the Event Hub (using Event Hub trigger binding). When a message arrives, the function takes it and calls the Log Ingestion API (similar to method (a) above) to put it into Log Analytics. Essentially, this just decouples the pulling from GCP (done by the first function) and the pushing to Sentinel (done by the second function). This two-stage design might be useful if you want to do more complex buffering or retries, but it does introduce more components. Using an Event Hub in the pipeline can be beneficial if you want a cloud-neutral queue between GCP and Azure (maybe your organization already consolidates logs in an Event Hub or Kafka). It also allows reusing any existing tools that read off Event Hubs (for example, maybe feeding the same data to another system in parallel to Sentinel). This pattern – Cloud Logging → Pub/Sub → Event Hub → Log Analytics – has been observed in real-world multi-cloud deployments, essentially treating Pub/Sub + Event Hub as a bridging message bus between clouds. Data Transformation: With a custom pipeline, you have full control (and responsibility) for any data transformations needed. Key considerations: Message Decoding: GCP Pub/Sub messages contain the log entry in the data field, which is a base64-encoded string of a JSON object. Your code must decode that (it’s a one-liner in most languages) to get the raw JSON log. After decoding, you’ll have a JSON structure identical to what you’d see in Cloud Logging. For example, an audit log entry JSON has fields like protoPayload, resourceName, etc. Schema Mapping: Decide how to map the JSON to your Log Analytics table. You could ingest the entire JSON as a single column (and later parse in KQL), but it’s often better to map important fields. For instance, for VPC Flow Logs, you might extract src_ip, dest_ip, src_port, dest_port, bytes_sent, action and map each to a column in a custom table. This requires that you create the custom table with those columns and configure the DCR’s transformation schema accordingly. If using the Log Ingestion API, the DCR can transform the incoming JSON to the table schema. If using the Data Collector API (legacy, not recommended now), your code would need to format records as the exact JSON that Log Analytics expects. Enrichment (optional): In a custom pipeline, you could enrich the logs before sending to Sentinel. For example, performing IP geolocation on VPC flow logs, or tagging DNS logs with threat intel (if a domain is known malicious) – so that the augmented information is stored in Sentinel. Be cautious: enrichment adds processing time and potential failure points. If it’s light (like a dictionary lookup), it might be fine; if heavy, consider doing it after ingestion using Sentinel analytics instead. Filtering: Another advantage of custom ingestion is that you can filter events at the bridge. You might decide to drop certain events entirely (to save cost or noise). For example, if DNS query logs are too verbose, you might only forward queries for certain domains or exclude known benign domains. Or for audit logs, you might exclude read-only operations. This gives flexibility beyond what the GCP sink filter can do, since you have the full event content to decide. The trade-off is complexity – every filter you implement must be maintained/justified. Batching: The Log Ingestion API allows sending multiple records in one call. It’s more efficient to batch a bunch of log events (say 100 at a time) into one API request rather than call per event. Your function can accumulate a short batch (with some timeout or max size) and send together. This improves throughput and lowers overhead. Ensure the payload stays within API limits (~1 MB per post, and 30,000 events per post for Log Ingestion API). Pub/Sub and Event Hub also have batch capabilities – you may receive multiple messages in one invocation or read them in a loop. Design your code to handle variable batch sizes. Authentication & Permissions (Custom Pipeline): You will effectively need to handle two authentications: GCP → Bridge, and Bridge → Azure: GCP to Bridge: If using a GCP Cloud Function triggered by Pub/Sub, GCP handles the auth for pulling the message (the function is simply invoked with the data). If pulling from Azure, you’ll need GCP credentials. The most secure way is to use a service account key with minimal permissions (just Pub/Sub subscriber on the subscription). Store this key securely (Azure Key Vault or as an App Setting in the Azure Function, possibly encrypted). The code uses this key (a JSON file or key string) to initialize the Pub/Sub client. Google’s libraries support reading the key from an environment variable. Alternatively, you could explore using the same Workload Identity Federation concept in reverse (Azure to GCP), but that’s non-trivial to set up manually for custom code. A service account key is straightforward but do rotate it periodically. On GCP’s side, you might also restrict the service account so it cannot access anything except Pub/Sub. Bridge to Azure: To call the Log Ingestion API, you need an Azure AD App Registration (client ID/secret) with permissions or a SAS token for the DCR. The modern approach: create an AAD app, grant it the role Monitoring Data Contributor on your Sentinel workspace or explicitly grant the DCR permissions (Log Ingestion Data Contributor). Then your code can use the app’s client ID and secret to get a token and call the API. This is a secure, managed way. Alternatively, the DCR can be configured with a shared access signature (SAS) that you generate in Azure – your code could use that SAS token in the API URL (so that no interactive auth is needed). The older Data Collector API used the workspace ID and a primary key for auth (HMAC SHA-256 header) – some existing solutions still use that, but since that API is being deprecated, it’s better to use the new method. In summary: ensure the Azure credentials are stored safely (Key Vault or GCP Secret Manager if function is in GCP) and that you follow principle of least privilege (only allow ingest, no read of other data). End-to-End Data Flow Example: To make this concrete, consider an example where we ingest Firewall/VPC logs using a custom pipeline (this mirrors a solution published by Microsoft for when these logs weren’t yet natively supported): A GCP Log Sink filters for VPC Firewall logs (the logs generated by GCP firewall rules, which are part of VPC flow logging) and exports them to a Pub/Sub topic. An Azure Function (in PowerShell, as in the example, or any language) runs on a timer. Every minute, it pulls all messages from the Pub/Sub subscription (using the Google APIs). The function authenticates with a stored service account key to do this. It then decodes each message’s JSON. The function constructs an output in the required format for Sentinel’s Log Ingestion API. In this case, they created a custom Log Analytics table (say GCPFirewall_CL) with columns matching the log fields (source IP, dest IP, action, etc.). The function maps each JSON field to a column. For instance, json.payload.sourceIp -> src_ip column. It then calls the Log Ingestion REST API to send a batch of log records. The call is authorized with an Azure AD app’s client ID/secret which the function has in its config. Upon successfully POSTing the data, the function sends an acknowledgment back to Pub/Sub for those messages (or, if using the Pub/Sub client in pull mode, it acks as it pulls). This removal is important to ensure the messages don’t get re-delivered. If the send to Azure fails for some reason, the function can choose not to ack, so that the message remains in Pub/Sub and will be retried on the next run (ensuring reliability). The logs show up in Sentinel under the custom table, and can now be used in queries and analytics just like any other log. The entire process from log generation in GCP to log ingestion in Sentinel can be only a few seconds of latency in this design, effectively near-real-time. Tooling & Infrastructure: When implementing the above, some recommended tools: Use official SDKs where possible. For example, Google Cloud has a Pub/Sub client library for Python, Node.js, C#, etc., which simplifies pulling messages. Azure has an SDK for the Monitor Ingestion API (or you can call the REST endpoints directly with an HTTP client). This saves time versus manually crafting HTTP calls and auth. Leverage Terraform or IaC for repeatability. You can automate creation of Azure resources (Function App, Event Hub, etc.) and even the GCP setup. For instance, the community SCC->Sentinel example provides Terraform scripts. This makes it easier to deploy the pipeline in dev/test and prod consistently. Logging and monitoring: implement robust logging in your function code. In GCP Cloud Functions, use Cloud Logging to record errors (so you can see if something fails). In Azure Functions, use Application Insights to track failures or performance metrics. Set up alerts if the function fails repeatedly or if an expected log volume drops (which could indicate a broken pipeline). Essentially, treat your custom pipeline as you would any production integration – monitor its health continuously. Example Use-Case – Ingesting All Custom GCP Logs: One key advantage of a custom approach is flexibility. Imagine you have a custom application writing logs to Google Cloud Logging (Stackdriver) that has no out-of-the-box Sentinel connector. You can still get those logs into Sentinel. As one cloud architect noted, they built a fully custom pipeline with GCP Log Sink -> Pub/Sub -> Cloud Function -> Sentinel, specifically to ingest arbitrary GCP logs beyond the built-in connectors. This unlocked visibility into application-specific events that would otherwise be siloed. While doing this, they followed many of the steps above, demonstrating that any log that can enter Pub/Sub can ultimately land in Sentinel. This extensibility is a major benefit of a custom solution – you’re not limited by what Microsoft or Google have pre-integrated. In summary, the custom ingestion route requires more effort up front, but it grants complete control. You can tune what you collect, transform data to your needs, and integrate logs that might not be natively supported. Organizations often resort to this if they have very specific needs or if they started building ingestion pipelines before native connectors were available. Many will start with custom for something like DNS logs and later switch to a native connector once available. A hybrid approach is also possible (using native connector for audit logs, but custom for a niche log source). Comparison of Native vs. Custom Ingestion Methods Both native and custom approaches will get your GCP logs into Microsoft Sentinel, but they differ in complexity and capabilities. The table below summarizes the trade-offs to help choose the right approach: Aspect Native GCP Connector (Sentinel Pub/Sub Integration) Custom Ingestion Pipeline (DIY via Event Hubs or API) Ease of Setup Low-Code Setup: Requires configuration in GCP and Azure, but no custom code. You use provided Terraform scripts and a UI wizard. In a few hours you can enable the connector if prerequisites (IAM, Pub/Sub) are met. Microsoft’s documentation guides the process step by step. High-Code Setup: Requires designing and writing integration code (function or app) and configuring cloud services (Function Apps, Event Hub, etc.). More moving parts mean a longer setup time – possibly days or weeks to develop and thoroughly test. Suitable if your team has cloud developers or if requirements demand it. Log Type Coverage Supported Logs: Out-of-the-box support for standard GCP logs (audit, SCC findings, VPC flow, DNS, etc.). However, it’s limited to those data types Microsoft has released connectors for. (As of 2025, many GCP services are covered, but not necessarily all Google products.) If a connector exists, it will reliably ingest that log type. Any Log Source: Virtually unlimited – you can ingest any log from GCP, including custom application logs or niche services, as long as you can export it to Pub/Sub. You define the pipeline for each new log source. This is ideal for custom logs beyond built-ins. The trade-off is you must build parsing/handling for each log format yourself. Development & Maintenance Minimal Maintenance: After initial setup, the connector runs as a service. No custom code to maintain; Microsoft handles updates/improvements. You might need to update configuration if GCP projects or requirements change, but generally it’s “configure and monitor.” Support is available from Microsoft for connector issues. Ongoing Maintenance: You own the code. Updates to log schemas, API changes, or cloud platform changes might require code modifications. You need to monitor the custom pipeline, handle exceptions, and possibly update credentials regularly. This approach is closer to software maintenance – expect to allocate effort for bug fixes or improvements over time. Scalability Cloud-Scale (Managed): The connector uses Azure’s cloud infrastructure which auto-scales. High volumes are handled by scaling out processing nodes behind the scenes. GCP Pub/Sub will buffer and deliver messages at whatever rate the connector can pull, and the connector is optimized for throughput. There’s effectively no hard limit exposed to you (aside from Log Analytics ingestion rate limits, which are very high). Custom Scaling Required: Scalability depends on your implementation. Cloud Functions and Event Hubs can scale, but you must configure them (e.g., set concurrency, ensure enough throughput units on Event Hub). If logs increase tenfold, you may need to tweak settings or upgrade plans. There’s more possibility of bottlenecks (e.g., a single-threaded function might lag). Designing for scale (parallelism, batching, multi-partition processing) is your responsibility. Reliability & Resilience Reliable by Design: Built on proven Google Pub/Sub durability and Azure’s reliable ingestion pipeline. The connector handles retries and acknowledgements. If issues occur, Microsoft likely addresses them in updates. Also, you get built-in monitoring in Sentinel for connector health. Reliability Varies: Requires implementing your own retry and error-handling logic. A well-built custom pipeline can be very reliable (e.g., using Pub/Sub’s ack/retry and durable Event Hub storage), but mistakes in code could drop logs or duplicate them. You need to test failure scenarios (network blips, API timeouts, etc.). Additionally, you must implement your own health checks/alerts to know if something breaks. Flexibility & Transformation Standardized Ingestion: Limited flexibility – it ingests logs in their native structure into pre-defined tables. Little opportunity to transform data (beyond what the connector’s mapping does). Essentially “what GCP sends is what you get,” and you rely on Sentinel’s parsing for analysis. All logs of a given type are ingested (you control scope via GCP sink filters, but not the content). Highly Flexible: You can customize everything – which fields to ingest, how to format them, and even augment logs with external data. For example, you could drop benign DNS queries or mask sensitive fields before sending. You can consolidate multiple GCP log types into one table or split one log type into multiple tables if desired. This freedom lets you tailor the data to your environment and use cases. The flip side is complexity: every transformation is custom logic to maintain. Cost Considerations Cost-Efficient Pipeline: There is no charge for the Sentinel connector itself (it’s part of the service). Costs on GCP: Pub/Sub charges are minimal (especially for pulling data) and logging export has no extra cost aside from the egress of the data. On Azure: you pay for data ingestion and storage in Log Analytics as usual. No need to run VMs or functions continuously. Overall, the native route avoids infrastructure costs – you’re mainly paying for the data volume ingested (which is unavoidable either way) and a tiny cost for Pub/Sub (pennies for millions of messages). Additional Costs: On top of Log Analytics ingestion costs, you will incur charges for the components you use. An Azure Function or Cloud Function has execution costs (though modest, they add up with high volume). An Event Hub has hourly charges based on throughput units and retention. Data egress from GCP to Azure will be charged by Google (network egress fees) – this also applies to the native connector, though in that case GCP egress is typically quite small cost. If your pipeline runs 24h, ensure to factor in those platform costs. Custom pipelines can also potentially reduce Log Analytics costs by filtering out data (saving money by not ingesting noise), so there’s a trade-off: spend on processing to save on storage, if needed. Support & Troubleshooting Vendor-Supported: Microsoft supports the connector – if things go wrong, you can open a support case. Documentation covers common setup issues. The connector UI will show error messages (e.g., authentication failures) to guide troubleshooting. Upgrades/improvements are handled by Microsoft (e.g., if GCP API changes, Microsoft will update the connector). Self-Support: You build it, you fix it. Debugging issues might involve checking logs across two clouds. Community forums and documentation can help (e.g., Google’s docs for Pub/Sub, Azure docs for Log Ingestion API). When something breaks, your team must identify whether it’s on the GCP side (sink or Pub/Sub) or Azure side (function error or DCR issue). This requires familiarity with both environments. There’s no single vendor to take responsibility for the end-to-end pipeline since it’s custom. In short, use the native connector whenever possible – it’s easier and reliably maintained. Opt for a custom solution only if you truly need the flexibility or to support logs that the native connectors can’t handle. Some organizations start with custom ingestion out of necessity (before native support exists) and later migrate to native connectors once available, to reduce their maintenance burden. Troubleshooting Common Issues Finally, regardless of method, you may encounter some hurdles. Here are common issues and ways to address them in the context of GCP-to-Sentinel log integration: No data appearing in Sentinel: If you’ve set up a connector and see no logs, first be patient – initial data can take ~10–30 minutes to show up. If nothing appears after that, verify the GCP side: Check the Log Router sink status in GCP (did you set the correct inclusion filters? Are logs actually being generated? You can view logs in Cloud Logging to confirm the events exist). Go to Pub/Sub and use the “Pull” option on the subscription to see if messages are piling up. If you can pull messages manually but Sentinel isn’t getting them, the issue is likely on the Azure side. In Sentinel, ensure the connector shows as Connected. If it’s in an error state, click on it to see details. A common misconfiguration is an incorrect Project Number or Service Account in the connector settings – one typo in those will prevent ingestion. Update the parameters if needed and reconnect. Authentication or Connectivity errors (native connector): These show up as errors like “Workload Identity Pool ID not found” or “Subscription does not exist” in the connector page. This usually means the values entered in the connector are mismatched: Double-check the Workload Identity Provider ID. It must exactly match the one in GCP (including correct project number). If you created the WIF pool in a different project than the Pub/Sub, remember the connector (until recently) expected them in one project. Ensure you used the correct project ID/number for all fields. Verify the service account email is correct and that you granted the Workload Identity User role on it. If not, the Azure identity cannot assume the service account. Check that the subscription name is correct and that the service account has roles/pubsub.subscriber on it. If you forgot to add that role, Azure will be denied access to Pub/Sub. Ensure the Azure AD app (which is automatically used by Sentinel) wasn’t deleted or disabled in your tenant. The Sentinel connector uses a multi-tenant app provided by Microsoft (identified by the audience GUID in the docs), which should be fine unless your org blocked third-party Azure apps. If you have restrictions, you might need to allow Microsoft’s Sentinel multi-cloud connector app. Tip: Try running the Terraform scripts provided by Microsoft if you did things manually and it’s failing. The scripts often can pinpoint what’s missing by setting everything up for you. Partial data or specific logs missing: If some expected events are not showing up: Revisit your sink filter in GCP. Perhaps the filter is too narrow. For example, for DNS logs, you might need to include both _Default logs and DNS-specific log IDs. Or for audit logs, remember that Data Access logs for certain services might be excluded by default (you have to enable Data Access audit logs in GCP for some services). If those aren’t enabled in GCP, they won’t be exported at all. If using the SCC connector, ensure you enabled continuous export of findings in SCC to Pub/Sub – those findings won’t flow unless explicitly configured. Check Sentinel’s table for any clues – sometimes logs might arrive under a slightly different table or format. E.g., if the connector was set up incorrectly initially, it might have sent data to a custom table with a suffix. Use Log Analytics query across all tables (or search by a specific IP or timestamp) to ensure the data truly isn’t there. Duplicate logs or high event counts (custom ingestion): If your custom pipeline isn’t carefully handling acknowledgments, you might ingest duplicates. For instance, if your function crashes after sending data to Sentinel but before acking Pub/Sub, the message will be retried later – resulting in the same log ingested twice. Over time this could double-count events. Solution: Ensure idempotency or proper ack logic. One way is to include a unique ID with each log (GCP audit logs have an insertId which is unique per log event; VPC flow logs have unique flowID for each flow chunk). You could use that as a de-duplication key on the Sentinel side (e.g., ingest it and deduplicate in queries). Or design the pipeline to mark messages as processed in an external store. However, the simplest is to acknowledge only after successful ingestion and let Pub/Sub handle retries. If you notice duplicates in Sentinel, double-check that your code isn’t calling ack too early or multiple times. Log Ingestion API errors (custom pipeline): When calling the Log Ingestion API, you might encounter HTTP errors: 400 Bad Request – often schema mismatch. This means the JSON you sent doesn’t match the DCR’s expected format. Check the error details; the API usually returns a message indicating which field is wrong. Common issues: sending a string value for a column defined as integer, missing a required column, or having an extra column that’s not in the table. Adjust your transformation or DCR accordingly. 403 Forbidden – authentication failure. Your Azure AD token might be expired or your app doesn’t have rights. Make sure the token is fresh (fetch a new one for each function run, or use a managed identity if supported and authorized). Also verify the app’s role assignments. 429 Too Many Requests / Throttling – you might be sending data too fast. The Log Ingestion API has throughput limits (per second per workspace). If you hit these, implement a backoff/retry and consider batching more. This is rare unless you have a very high log rate. Azure Function timeouts – if using Functions, sometimes the default timeout (e.g., 5 minutes for an HTTP-triggered function) might be hit if processing a large batch. Consider increasing the timeout setting or splitting work into smaller chunks. Connector health alerts: If you enabled the health feature for connectors, you might get alerted that “no logs received from GCP Audit Logs in last X minutes” etc. If this is a false alarm (e.g., simply that there were genuinely no new logs in GCP during that period), you can adjust the alert logic or threshold. But if it’s a real issue, treat it as an incident: check GCP’s Cloud Logging to ensure new events exist (if not, maybe nothing happened – e.g., no admin activity in the last hour). If events do exist in GCP but none in Sentinel, you have a pipeline problem – refer to the earlier troubleshooting steps for auth/connectivity. Updating or Migrating pipelines: Over time, you might replace a custom pipeline with a native connector (or vice versa). Be cautious of duplicate ingestion if both are running simultaneously. For example, if you enable the new GCP DNS connector while your custom DNS log pipeline is still on, you’ll start ingesting DNS logs twice. Plan a cutover: disable one before enabling the other in production. Also, if migrating, note that the data may land in a different table (the native connector might use GCPDNS table whereas your custom went to GCP_DNS_Custom_CL). You may need to adjust your queries and workbooks to unify this. It could be worthwhile to backfill historical data for continuity if needed. By following these practices and monitoring closely, you can ensure a successful integration of GCP logs into Microsoft Sentinel. The end result is a centralized view in Sentinel where your Azure, AWS, on-prem, and now GCP logs all reside – empowering your SOC to run advanced detections and investigations across your multi-cloud environment using a single pane of glass.58Views5likes2CommentsMFA alerts for when a alternative phone number is added
Hi, i need to be able to find a way when someones adds a alternative phone number to MFA it sends an alert via email that would go into a shared mailbox but haven't been able to find a way to get the MFA alerts for alternative phone numbers. can someone help please?467Views2likes1CommentUpdating SDK for Java used by Defender for Server/CSPM in AWS
Hi, I have a customer who is Defender for Cloud/CSPM in AWS. Last week, Cloud AWS Health Dashboard lit up with a recommendation around the use of AWS SDK for Java 1.x in their organization. This version will reach end of support on December 31, 2025. The recommendation is to migrate to AWS SDK for Java 2.x. The issue is present in all of AWS workload accounts. They found that a large amount of these alerts is caused by the Defender CSPM service, running remotely, and using AWS SDK for Java 1.x. Customer attaching a couple of sample events that were gathered from the CloudTrail logs. Please note that in both cases: assumed-role: DefenderForCloud-Ciem sourceIP: 20.237.136.191 (MS Azure range) userAgent: aws-sdk-java/1.12.742 Linux/6.6.112.1-2.azl3 OpenJDK_64-Bit_Server_VM/21.0.9+10-LTS java/21.0.9 kotlin/1.6.20 vendor/Microsoft cfg/retry-mode/legacy cfg/auth-source#unknown Can someone provide guidance about this? How to find out if DfC is going to leverage AWS SDK for Java 2.x after Dec 31, 2025? Thanks, Terru37Views0likes0CommentsUnifying AWS and Azure Security Operations with Microsoft Sentinel
The Multi-Cloud Reality Most modern enterprises operate in multi-cloud environments using Azure for core workloads and AWS for development, storage, or DevOps automation. While this approach increases agility, it also expands the attack surface. Each platform generates its own telemetry: Azure: Activity Logs, Defender for Cloud, Entra ID sign-ins, Sentinel analytics AWS: CloudTrail, GuardDuty, Config, and CloudWatch Without a unified view, security teams struggle to detect cross-cloud threats promptly. That’s where Microsoft Sentinel comes in, bridging Azure and AWS into a single, intelligent Security Operations Center (SOC). Architecture Overview Connect AWS Logs to Sentinel AWS CloudTrail via S3 Connector Enable the AWS CloudTrail connector in Sentinel. Provide your S3 bucket and IAM role ARN with read access. Sentinel will automatically normalize logs into the AWSCloudTrail table. AWS GuardDuty Connector Use the AWS GuardDuty API integration for threat detection telemetry. Detected threats, such as privilege escalation or reconnaissance, appear in Sentinel as the AWSGuardDuty table. Normalize and Enrich Data Once logs are flowing, enrich them to align with Azure activity data. Example KQL for mapping CloudTrail to Sentinel entities: AWSCloudTrail | extend AccountId = tostring(parse_json(Resources)[0].accountId) | extend User = tostring(parse_json(UserIdentity).userName) | extend IPAddress = tostring(SourceIpAddress) | project TimeGenerated, EventName, User, AccountId, IPAddress, AWSRegion Then correlate AWS and Azure activities: let AWS = AWSCloudTrail | summarize AWSActivity = count() by User, bin(TimeGenerated, 1h); let Azure = AzureActivity | summarize AzureActivity = count() by Caller, bin(TimeGenerated, 1h); AWS | join kind=inner (Azure) on $left.User == $right.Caller | where AWSActivity > 0 and AzureActivity > 0 | project TimeGenerated, User, AWSActivity, AzureActivity Automate Cross-Cloud Response Once incidents are correlated, Microsoft Sentinel Playbooks (Logic Apps) can automate your response: Example Playbook: “CrossCloud-Containment.json” Disable user in Entra ID Send a command to the AWS API via Lambda to deactivate IAM key Notify SOC in Teams Create ServiceNow ticket POST https://api.aws.amazon.com/iam/disable-access-key PATCH https://graph.microsoft.com/v1.0/users/{user-id} { "accountEnabled": false } Build a Multi-Cloud SOC Dashboard Use Sentinel Workbooks to visualize unified operations: Query 1 – CloudTrail Events by Region AWSCloudTrail | summarize Count = count() by AWSRegion | render barchart Query 2 – Unified Security Alerts union SecurityAlert, AWSGuardDuty | summarize TotalAlerts = count() by ProviderName, Severity | render piechart Scenario Incident: A compromised developer account accesses EC2 instances on AWS and then logs into Azure via the same IP. Detection Flow: CloudTrail logs → Sentinel detects unusual API calls Entra ID sign-ins → Sentinel correlates IP and user Sentinel incident triggers playbook → disables user in Entra ID, suspends AWS IAM key, notifies SOC Strengthen Governance with Defender for Cloud Enable Microsoft Defender for Cloud to: Monitor both Azure and AWS accounts from a single portal Apply CIS benchmarks for AWS resources Surface findings in Sentinel’s SecurityRecommendations table213Views4likes0CommentsScam Defender pop Up… help please
Can someone please help my dad has had what looks like scam pop ups come up on his MacBook I have reported to Microsoft but don’t know how long it will be until they get back to me. I want to know if anyone can confirm they’re scams and how I can get them off his screen, it won’t let us click on the X Thank you97Views0likes2CommentsDefender Entity Page w/ Sentinel Events Tab
One device is displaying the Sentinel Events Tab, while the other is not. The only difference observed is that one device is Azure AD (AAD) joined and the other is Domain Joined. Could this difference account for the missing Sentinel events data? Any insight would be appreciated!174Views0likes2CommentsSecurity alerts Graph API and MFA
Hello, Does anyone know if https://learn.microsoft.com/en-us/graph/api/resources/partner-security-partnersecurityalert-api-overview?view=graph-rest-beta Security alerts works with app only method (app + cert or app + secret )? Currently i successfully login to graph using both methods Welcome to Microsoft Graph! Connected via apponly access using "ID number" Then i request to get alerts and i get error code: WARNING: Error body: {"error":{"code":"UnknownError","message":"{\"Error\":{\"Code\":\"Unauthorized_MissingMFA\",\"Message\":\"MFA is required for this request. The provided authentication token does not have MFA I found that old API worked with app+user credential only but it's not mentioned in new one about this restriction.Part 3: Unified Security Intelligence - Orchestrating GenAI Threat Detection with Microsoft Sentinel
Why Sentinel for GenAI Security Observability? Before diving into detection rules, let's address why Microsoft Sentinel is uniquely positioned for GenAI security operations—especially compared to traditional or non-native SIEMs. Native Azure Integration: Zero ETL Overhead The problem with external SIEMs: To monitor your GenAI workloads with a third-party SIEM, you need to: Configure log forwarding from Log Analytics to external systems Set up data connectors or agents for Azure OpenAI audit logs Create custom parsers for Azure-specific log schemas Maintain authentication and network connectivity between Azure and your SIEM Pay data egress costs for logs leaving Azure The Sentinel advantage: Your logs are already in Azure. Sentinel connects directly to: Log Analytics workspace - Where your Container Insights logs already flow Azure OpenAI audit logs - Native access without configuration Azure AD sign-in logs - Instant correlation with identity events Defender for Cloud alerts - Platform-level AI threat detection included Threat intelligence feeds - Microsoft's global threat data built-in Microsoft Defender XDR - AI-driven cybersecurity that unifies threat detection and response across endpoints, email, identities, cloud apps and Sentinel There's no data movement, no ETL pipelines, and no latency from log shipping. Your GenAI security data is queryable in real-time. KQL: Built for Complex Correlation at Scale Why this matters for GenAI: Detecting sophisticated AI attacks requires correlating: Application logs (your code from Part 2) Azure OpenAI service logs (API calls, token usage, throttling) Identity signals (who authenticated, from where) Threat intelligence (known malicious IPs) Defender for Cloud alerts (platform-level anomalies) KQL's advantage: Kusto Query Language is designed for this. You can: Join across multiple data sources in a single query Parse nested JSON (like your structured logs) natively Use time-series analysis functions for anomaly detection and behavior patterns Aggregate millions of events in seconds Extract entities (users, IPs, sessions) automatically for investigation graphs Example: Correlating your app logs with Azure AD sign-ins and Defender alerts takes 10 lines of KQL. In a traditional SIEM, this might require custom scripts, data normalization, and significantly slower performance. User Security Context Flows Natively Remember the user_security_context you pass in extra_body from Part 2? That context: Automatically appears in Azure OpenAI's audit logs Flows into Defender for Cloud AI alerts Is queryable in Sentinel without custom parsing Maps to the same identity schema as Azure AD logs With external SIEMs: You'd need to: Extract user context from your application logs Separately ingest Azure OpenAI logs Write correlation logic to match them Maintain entity resolution across different data sources With Sentinel: It just works. The end_user_id, source_ip, and application_name are already normalized across Azure services. Built-In AI Threat Detection Sentinel includes pre-built detections for cloud and AI workloads: Azure OpenAI anomalous access patterns (out of the box) Unusual token consumption (built-in analytics templates) Geographic anomalies (using Azure's global IP intelligence) Impossible travel detection (cross-referencing sign-ins with AI API calls) Microsoft Defender XDR (correlation with endpoint, email, cloud app signals) These aren't generic "high volume" alerts—they're tuned for Azure AI services by Microsoft's security research team. You can use them as-is or customize them with your application-specific context. Entity Behavior Analytics (UEBA) Sentinel's UEBA automatically builds baselines for: Normal request volumes per user Typical request patterns per application Expected geographic access locations Standard model usage patterns Then it surfaces anomalies: "User_12345 normally makes 10 requests/day, suddenly made 500 in an hour" "Application_A typically uses GPT-3.5, suddenly switched to GPT-4 exclusively" "User authenticated from Seattle, made AI requests from Moscow 10 minutes later" This behavior modeling happens automatically—no custom ML model training required. Traditional SIEMs would require you to build this logic yourself. The Bottom Line For GenAI security on Azure: Sentinel reduces time-to-detection because data is already there Correlation is simpler because everything speaks the same language Investigation is faster because entities are automatically linked Cost is lower because you're not paying data egress fees Maintenance is minimal because connectors are native If your GenAI workloads are on Azure, using anything other than Sentinel means fighting against the platform instead of leveraging it. From Logs to Intelligence: The Complete Picture Your structured logs from Part 2 are flowing into Log Analytics. Here's what they look like: { "timestamp": "2025-10-21T14:32:17.234Z", "level": "INFO", "message": "LLM Request Received", "request_id": "a7c3e9f1-4b2d-4a8e-9c1f-3e5d7a9b2c4f", "session_id": "550e8400-e29b-41d4-a716-446655440000", "prompt_hash": "d3b07384d113edec49eaa6238ad5ff00", "security_check_passed": "PASS", "source_ip": "203.0.113.42", "end_user_id": "user_550e8400", "application_name": "AOAI-Customer-Support-Bot", "model_deployment": "gpt-4-turbo" } These logs are in the ContainerLogv2 table since our application “AOAI-Customer-Support-Bot” is running on Azure Kubernetes Services (AKS). Steps to Setup AKS to stream logs to Sentinel/Log Analytics From Azure portal, navigate to your AKS, then to Monitoring -> Insights Select Monitor Settings Under Container Logs Select the Sentinel-enabled Log Analytics workspace Select Logs and events Check the ‘Enable ContainerLogV2’ and ‘Enable Syslog collection’ options More details can be found at this link Kubernetes monitoring in Azure Monitor - Azure Monitor | Microsoft Learn Critical Analytics Rules: What to Detect and Why Rule 1: Prompt Injection Attack Detection Why it matters: Prompt injection is the GenAI equivalent of SQL injection. Attackers try to manipulate the model by overriding system instructions. Multiple attempts indicate intentional malicious behavior. What to detect: 3+ prompt injection attempts within 10 minutes from similar IP let timeframe = 1d; let threshold = 3; AlertEvidence | where TimeGenerated >= ago(timeframe) and EntityType == "Ip" | where DetectionSource == "Microsoft Defender for AI Services" | where Title contains "jailbreak" or Title contains "prompt injection" | summarize count() by bin (TimeGenerated, 1d), RemoteIP | where count_ >= threshold What the SOC sees: User identity attempting injection Source IP and geographic location Sample prompts for investigation Frequency indicating automation vs. manual attempts Severity: High (these are actual attempts to bypass security) Rule 2: Content Safety Filter Violations Why it matters: When Azure AI Content Safety blocks a request, it means harmful content (violence, hate speech, etc.) was detected. Multiple violations indicate intentional abuse or a compromised account. What to detect: Users with 3+ content safety violations in a 1 hour block during a 24 hour time period. let timeframe = 1d; let threshold = 3; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.end_user_id) | where LogMessage.security_check_passed == "FAIL" | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize count() by bin(TimeGenerated, 1h),source_ip,end_user_id,session_id,Computer,application_name,security_check_passed | where count_ >= threshold What the SOC sees: Severity based on violation count Time span showing if it's persistent vs. isolated Prompt samples (first 80 chars) for context Session ID for conversation history review Severity: High (these are actual harmful content attempts) Rule 3: Rate Limit Abuse Why it matters: Persistent rate limit violations indicate automated attacks, credential stuffing, or attempts to overwhelm the system. Legitimate users who hit rate limits don't retry 10+ times in minutes. What to detect: Users blocked by rate limiter 5+ times in 10 minutes let timeframe = 1h; let threshold = 5; AzureDiagnostics | where ResourceProvider == "MICROSOFT.COGNITIVESERVICES" | where OperationName == "Completions" or OperationName contains "ChatCompletions" | extend tokensUsed = todouble(parse_json(properties_s).usage.total_tokens) | summarize totalTokens = sum(tokensUsed), requests = count(), rateLimitErrors = countif(httpstatuscode_s == "429") by bin(TimeGenerated, 1h) | where count_ >= threshold What the SOC sees: Whether it's a bot (immediate retries) or human (gradual retries) Duration of attack Which application is targeted Correlation with other security events from same user/IP Severity: Medium (nuisance attack, possible reconnaissance) Rule 4: Anomalous Source IP for User Why it matters: A user suddenly accessing from a new country or VPN could indicate account compromise. This is especially critical for privileged accounts or after-hours access. What to detect: User accessing from an IP never seen in the last 7 days let lookback = 7d; let recent = 1h; let baseline = IdentityLogonEvents | where Timestamp between (ago(lookback + recent) .. ago(recent)) | where isnotempty(IPAddress) | summarize knownIPs = make_set(IPAddress) by AccountUpn; ContainerLogV2 | where TimeGenerated >= ago(recent) | where isnotempty(LogMessage.source_ip) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | extend full_prompt_sample = tostring (LogMessage.full_prompt_sample) | lookup baseline on $left.AccountUpn == $right.end_user_id | where isnull(knownIPs) or IPAddress !in (knownIPs) | project TimeGenerated, source_ip, end_user_id, session_id, Computer, application_name, security_check_passed, full_prompt_sample What the SOC sees: User identity and new IP address Geographic location change Whether suspicious prompts accompanied the new IP Timing (after-hours access is higher risk) Severity: Medium (environment compromise, reconnaissance) Rule 5: Coordinated Attack - Same Prompt from Multiple Users Why it matters: When 5+ users send identical prompts, it indicates a bot network, credential stuffing, or organized attack campaign. This is not normal user behavior. What to detect: Same prompt hash from 5+ different users within 1 hour let timeframe = 1h; let threshold = 5; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.prompt_hash) | where isnotempty(LogMessage.end_user_id) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend prompt_hash=tostring(LogMessage.prompt_hash) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | project TimeGenerated, prompt_hash, source_ip, end_user_id, application_name, security_check_passed | summarize DistinctUsers = dcount(end_user_id), Attempts = count(), Users = make_set(end_user_id, 100), IpAddress = make_set(source_ip, 100) by prompt_hash, bin(TimeGenerated, 1h) | where DistinctUsers >= threshold What the SOC sees: Attack pattern (single attacker with stolen accounts vs. botnet) List of compromised user accounts Source IPs for blocking Prompt sample to understand attack goal Severity: High (indicates organized attack) Rule 6: Malicious model detected Why it matters: Model serialization attacks can lead to serious compromise. When Defender for Cloud Model Scanning identifies issues with a custom or opensource model that is part of Azure ML Workspace, Registry, or hosted in Foundry, that may be or may not be a user oversight. What to detect: Model scan results from Defender for Cloud and if it is being actively used. What the SOC sees: Malicious model Applications leveraging the model Source IPs and users accessed the model Severity: Medium (can be user oversight) Advanced Correlation: Connecting the Dots The power of Sentinel is correlating your application logs with other security signals. Here are the most valuable correlations: Correlation 1: Failed GenAI Requests + Failed Sign-Ins = Compromised Account Why: Account showing both authentication failures and malicious AI prompts is likely compromised within a 1 hour timeframe l let timeframe = 1h; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.source_ip) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | extend full_prompt_sample = tostring (LogMessage.full_prompt_sample) | extend message = tostring (LogMessage.message) | where security_check_passed == "FAIL" or message contains "WARNING" | join kind=inner ( SigninLogs | where ResultType != 0 // 0 means success, non-zero indicates failure | project TimeGenerated, UserPrincipalName, ResultType, ResultDescription, IPAddress, Location, AppDisplayName ) on $left.end_user_id == $right.UserPrincipalName | project TimeGenerated, source_ip, end_user_id, application_name, full_prompt_sample, prompt_hash, message, security_check_passed Severity: High (High probability of compromise) Correlation 2: Application Logs + Defender for Cloud AI Alerts Why: Defender for Cloud AI Threat Protection detects platform-level threats (unusual API patterns, data exfiltration attempts). When both your code and the platform flag the same user, confidence is very high. let timeframe = 1h; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.source_ip) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | extend full_prompt_sample = tostring (LogMessage.full_prompt_sample) | extend message = tostring (LogMessage.message) | where security_check_passed == "FAIL" or message contains "WARNING" | join kind=inner ( AlertEvidence | where TimeGenerated >= ago(timeframe) and AdditionalFields.Asset == "true" | where DetectionSource == "Microsoft Defender for AI Services" | project TimeGenerated, Title, CloudResource ) on $left.application_name == $right.CloudResource | project TimeGenerated, application_name, end_user_id, source_ip, Title Severity: Critical (Multi-layer detection) Correlation 3: Source IP + Threat Intelligence Feeds Why: If requests come from known malicious IPs (C2 servers, VPN exit nodes used in attacks), treat them as high priority even if behavior seems normal. //This rule correlates GenAI app activity with Microsoft Threat Intelligence feed available in Sentinel and Microsoft XDR for malicious IP IOCs let timeframe = 10m; ContainerLogV2 | where TimeGenerated >= ago(timeframe) | where isnotempty(LogMessage.source_ip) | extend source_ip=tostring(LogMessage.source_ip) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name = tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | extend full_prompt_sample = tostring (LogMessage.full_prompt_sample) | join kind=inner ( ThreatIntelIndicators | where IsActive == "true" | where ObservableKey startswith "ipv4-addr" or ObservableKey startswith "network-traffic" | project IndicatorIP = ObservableValue ) on $left.source_ip == $right.IndicatorIP | project TimeGenerated, source_ip, end_user_id, application_name, full_prompt_sample, security_check_passed Severity: High (Known bad actor) Workbooks: What Your SOC Needs to See Executive Dashboard: GenAI Security Health Purpose: Leadership wants to know: "Are we secure?" Answer with metrics. Key visualizations: Security Status Tiles (24 hours) Total Requests Success Rate Blocked Threats (Self detected + Content Safety + Threat Protection for AI) Rate Limit Violations Model Security Score (Red Team evaluation status of currently deployed model) ContainerLogV2 | where TimeGenerated > ago (1d) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize SuccessCount=countif(security_check_passed == "PASS"), FailedCount=countif(security_check_passed == "FAIL") by bin(TimeGenerated, 1h) | extend TotalRequests = SuccessCount + FailedCount | extend SuccessRate = todouble(SuccessCount)/todouble(TotalRequests) * 100 | order by SuccessRate 1. Trend Chart: Pass vs. Fail Over Time Shows if attack volume is increasing Identifies attack time windows Validates that defenses are working ContainerLogV2 | where TimeGenerated > ago (14d) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize SuccessCount=countif(security_check_passed == "PASS"), FailedCount=countif(security_check_passed == "FAIL") by bin(TimeGenerated, 1d) | render timechart 2. Top 10 Users by Security Events Bar chart of users with most failures ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.end_user_id) | extend end_user_id=tostring(LogMessage.end_user_id) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "FAIL" | summarize FailureCount = count() by end_user_id | top 20 by FailureCount | render barchart Applications with most failures ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.application_name) | extend application_name=tostring(LogMessage.application_name) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "FAIL" | summarize FailureCount = count() by application_name | top 20 by FailureCount | render barchart 3. Geographic Threat Map Where are attacks originating? Useful for geo-blocking decisions ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.application_name) | extend application_name=tostring(LogMessage.application_name) | extend source_ip=tostring(LogMessage.source_ip) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "FAIL" | extend GeoInfo = geo_info_from_ip_address(source_ip) | project sourceip, GeoInfo.counrty, GeoInfo.city Analyst Deep-Dive: User Behavior Analysis Purpose: SOC analyst investigating a specific user or session Key components: 1. User Activity Timeline Every request from the user in time order ContainerLogV2 | where isnotempty(LogMessage.end_user_id) | project TimeGenerated, LogMessage.source_ip, LogMessage.end_user_id, LogMessage. session_id, Computer, LogMessage.application_name, LogMessage.request_id, LogMessage.message, LogMessage.full_prompt_sample | order by tostring(LogMessage_end_user_id), TimeGenerated Color-coded by security status AlertInfo | where DetectionSource == "Microsoft Defender for AI Services" | project TimeGenerated, AlertId, Title, Category, Severity, SeverityColor = case( Severity == "High", "🔴 High", Severity == "Medium", "🟠 Medium", Severity == "Low", "🟢 Low", "⚪ Unknown" ) 2. Session Analysis Table All sessions for the user ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.end_user_id) | extend end_user_id=tostring(LogMessage.end_user_id) | where end_user_id == "<username>" // Replace with actual username | extend application_name=tostring(LogMessage.application_name) | extend source_ip=tostring(LogMessage.source_ip) | extend session_id=tostri1ng(LogMessage.session_id) | extend security_check_passed = tostring (LogMessage.security_check_passed) | project TimeGenerated, session_id, end_user_id, application_name, security_check_passed Failed requests per session ContainerLogV2 | where TimeGenerated > ago (1d) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "FAIL" | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize Failed_Sessions = count() by end_user_id, session_id | order by Failed_Sessions Session duration ContainerLogV2 | where TimeGenerated > ago (1d) | where isnotempty(LogMessage.session_id) | extend security_check_passed = tostring (LogMessage.security_check_passed) | where security_check_passed == "PASS" | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name=tostring(LogMessage.application_name) | extend source_ip=tostring(LogMessage.source_ip) | summarize Start=min(TimeGenerated), End=max(TimeGenerated), count() by end_user_id, session_id, source_ip, application_name | extend DurationSeconds = datetime_diff("second", End, Start) 3. Prompt Pattern Detection Unique prompts by hash Frequency of each pattern Detect if user is fuzzing/testing boundaries Sample query for user investigation: ContainerLogV2 | where TimeGenerated > ago (14d) | where isnotempty(LogMessage.prompt_hash) | where isnotempty(LogMessage.full_prompt_sample) | extend prompt_hash=tostring(LogMessage.prompt_hash) | extend full_prompt_sample=tostring(LogMessage.full_prompt_sample) | extend application_name=tostring(LogMessage.application_name) | summarize count() by prompt_hash, full_prompt_sample | order by count_ Threat Hunting Dashboard: Proactive Detection Purpose: Find threats before they trigger alerts Key queries: 1. Suspicious Keywords in Prompts (e.g. Ignore, Disregard, system prompt, instructions, DAN, jailbreak, pretend, roleplay) let suspicious_prompts = externaldata (content_policy:int, content_policy_name:string, q_id:int, question:string) [ @"https://raw.githubusercontent.com/verazuo/jailbreak_llms/refs/heads/main/data/forbidden_question/forbidden_question_set.csv"] with (format="csv", has_header_row=true, ignoreFirstRecord=true); ContainerLogV2 | where TimeGenerated > ago (14d) | where isnotempty(LogMessage.full_prompt_sample) | extend full_prompt_sample=tostring(LogMessage.full_prompt_sample) | where full_prompt_sample in (suspicious_prompts) | extend end_user_id=tostring(LogMessage.end_user_id) | extend session_id=tostring(LogMessage.session_id) | extend application_name=tostring(LogMessage.application_name) | extend source_ip=tostring(LogMessage.source_ip) | project TimeGenerated, session_id, end_user_id, source_ip, application_name, full_prompt_sample 2. High-Volume Anomalies User sending too many requests by a IP or User. Assuming that Foundry Projects are configured to use Azure AD and not API Keys. //50+ requests in 1 hour let timeframe = 1h; let threshold = 50; AzureDiagnostics | where ResourceProvider == "MICROSOFT.COGNITIVESERVICES" | where OperationName == "Completions" or OperationName contains "ChatCompletions" | extend tokensUsed = todouble(parse_json(properties_s).usage.total_tokens) | summarize totalTokens = sum(tokensUsed), requests = count() by bin(TimeGenerated, 1h),CallerIPAddress | where count_ >= threshold 3. Rare Failures (Novel Attack Detection) Rare failures might indicate zero-day prompts or new attack techniques //10 or more failures in 24 hours ContainerLogV2 | where TimeGenerated >= ago (24h) | where isnotempty(LogMessage.security_check_passed) | extend security_check_passed=tostring(LogMessage.security_check_passed) | where security_check_passed == "FAIL" | extend application_name=tostring(LogMessage.application_name) | extend end_user_id=tostring(LogMessage.end_user_id) | extend source_ip=tostring(LogMessage.source_ip) | summarize FailedAttempts = count(), FirstAttempt=min(TimeGenerated), LastAttempt=max(TimeGenerated) by application_name | extend DurationHours = datetime_diff('hour', LastAttempt, FirstAttempt) | where DurationHours >= 24 and FailedAttempts >=10 | project application_name, FirstAttempt, LastAttempt, DurationHours, FailedAttempts Measuring Success: Security Operations Metrics Key Performance Indicators Mean Time to Detect (MTTD): let AppLog = ContainerLogV2 | extend application_name=tostring(LogMessage.application_name) | extend security_check_passed=tostring (LogMessage.security_check_passed) | extend session_id=tostring(LogMessage.session_id) | extend end_user_id=tostring(LogMessage.end_user_id) | extend source_ip=tostring(LogMessage.source_ip) | where security_check_passed=="FAIL" | summarize FirstLogTime=min(TimeGenerated) by application_name, session_id, end_user_id, source_ip; let Alert = AlertEvidence | where DetectionSource == "Microsoft Defender for AI Services" | extend end_user_id = tostring(AdditionalFields.AadUserId) | extend source_ip=RemoteIP | extend application_name=CloudResource | summarize FirstAlertTime=min(TimeGenerated) by AlertId, Title, application_name, end_user_id, source_ip; AppLog | join kind=inner (Alert) on application_name, end_user_id, source_ip | extend DetectionDelayMinutes=datetime_diff('minute', FirstAlertTime, FirstLogTime) | summarize MTTD_Minutes=round(avg (DetectionDelayMinutes),2) by AlertId, Title Target: <= 15 minutes from first malicious activity to alert Mean Time to Respond (MTTR): SecurityIncident | where Status in ("New", "Active") | where CreatedTime >= ago(14d) | extend ResponseDelay = datetime_diff('minute', LastActivityTime, FirstActivityTime) | summarize MTTR_Minutes = round (avg (ResponseDelay),2) by CreatedTime, IncidentNumber | order by CreatedTime, IncidentNumber asc Target: < 4 hours from alert to remediation Threat Detection Rate: ContainerLogV2 | where TimeGenerated > ago (1d) | extend security_check_passed = tostring (LogMessage.security_check_passed) | summarize SuccessCount=countif(security_check_passed == "PASS"), FailedCount=countif(security_check_passed == "FAIL") by bin(TimeGenerated, 1h) | extend TotalRequests = SuccessCount + FailedCount | extend SuccessRate = todouble(SuccessCount)/todouble(TotalRequests) * 100 | order by SuccessRate Context: 1-3% is typical for production systems (most traffic is legitimate) What You've Built By implementing the logging from Part 2 and the analytics rules in this post, your SOC now has: ✅ Real-time threat detection - Alerts fire within minutes of malicious activity ✅ User attribution - Every incident has identity, IP, and application context ✅ Pattern recognition - Detect both volume-based and behavior-based attacks ✅ Correlation across layers - Application logs + platform alerts + identity signals ✅ Proactive hunting - Dashboards for finding threats before they trigger rules ✅ Executive visibility - Metrics showing program effectiveness Key Takeaways GenAI threats need GenAI-specific analytics - Generic rules miss context like prompt injection, content safety violations, and session-based attacks Correlation is critical - The most sophisticated attacks span multiple signals. Correlating app logs with identity and platform alerts catches what individual rules miss. User context from Part 2 pays off - end_user_id, source_ip, and session_id enable investigation and response at scale Prompt hashing enables pattern detection - Detect repeated attacks without storing sensitive prompt content Workbooks serve different audiences - Executives want metrics; analysts want investigation tools; hunters want anomaly detection Start with high-fidelity rules - Content Safety violations and rate limit abuse have very low false positive rates. Add behavioral rules after establishing baselines. What's Next: Closing the Loop You've now built detection and visibility. In Part 4, we'll close the security operations loop with: Part 4: Platform Integration and Automated Response Building SOAR playbooks for automated incident response Implementing automated key rotation with Azure Key Vault Blocking identities in Entra Creating feedback loops from incidents to code improvements The journey from blind spot to full security operations capability is almost complete. Previous: Part 1: Securing GenAI Workloads in Azure: A Complete Guide to Monitoring and Threat Protection - AIO11Y | Microsoft Community Hub Part 2: Part 2: Building Security Observability Into Your Code - Defensive Programming for Azure OpenAI | Microsoft Community Hub Next: Part 4: Platform Integration and Automated Response (Coming soon)How to stop incidents merging under new incident (MultiStage) in defender.
Dear All We are experiencing a challenge with the integration between Microsoft Sentinel and the Defender portal where multiple custom rule alerts and analytic rule incidents are being automatically merged into a single incident named "Multistage." This automatic incident merging affects the granularity and context of our investigations, especially for important custom use cases such as specific admin activities and differentiated analytic logic. Key concerns include: Custom rule alerts from Sentinel merging undesirably into a single "Multistage" incident in Defender, causing loss of incident-specific investigation value. Analytic rules arising from different data sources and detection logic are merged, although they represent distinct security events needing separate attention. Customers require and depend on distinct, non-merged incidents for custom use cases, and the current incident correlation and merging behavior undermines this requirement. We understand that Defender’s incident correlation engine merges incidents based on overlapping entities, timelines, and behaviors but would like guidance or configuration best practices to disable or minimize this automatic merging behavior for our custom and analytic rule incidents. Our goal is to maintain independent incidents corresponding exactly to our custom alerts so that hunting, triage, and response workflows remain precise and actionable. Any recommendations or advanced configuration options to achieve this separation would be greatly appreciated. Thank you for your assistance. Best regardsSolved371Views1like6CommentsDemystifying AI Security Posture Management
Introduction In the ever-evolving paradigm shift that is Generative AI, adoption is accelerating at an unprecedented level. Organizations find it increasingly challenging to keep up with the multiple security branches of defence and attack that are complementing the adoption. With agentic and autonomous agents being the new security frontier we will be concentrating on for the next 10 years, the need to understand, secure and govern what Generative AI applications are running within an organisation becomes critical. Organizations that have a strong “security first” principle have been able to integrate AI by following appropriate methodologies such as Microsoft’s Prepare, Discover, Protect and Govern approach, and are now accelerating the adoption with strong posture management. Link: Build a strong security posture for AI | Microsoft Learn However, due to the nature of this rapid adoption, many organizations have found themselves in a “chicken and egg” situation whereby they are racing to allow employees and developers to adopt and embrace both Low Code and Pro Code solutions such as Microsoft Copilot Studio and Microsoft Foundry, but due to governance and control policies not being implemented in time, now find themselves in a Shadow AI situation, and require the ability to retroactively assess already deployed solutions. Why AI Security Posture Management? Generative AI Workloads, like any other, can only be secured and governed if the organization is aware of their existence and usage. With the advent of Generative AI we now not only have Shadow IT but also Shadow AI, so the need to be able to discover, assess, understand, and govern the Generative AI tooling that is being used in an organisation is now more important than ever. Consider the risks mentioned in the recent Microsoft Digital Defence Report and how they align to AI Usage, AI Applications and AI Platform Security. As Generative AI becomes more ingrained in the day-to-day operations of organizations, so does the potential for increased attack vectors, misuse and the need for appropriate security oversight and mitigation. Link: Microsoft Digital Defense Report 2025 – Safeguarding Trust in the AI Era A recent study by KMPG discussing Shadow AI listed the following statistics: 44% of employees have used AI in ways that contravene policies and guidelines, indicating a significant prevalence of shadow AI in organizations. 57% of employees have made mistakes due to AI, and 58 percent have relied on AI output without evaluating its accuracy. 41% of employees report that their organization has a policy guiding the use of GenAI, highlighting a huge gap in guardrails. A very informed comment by Sawmi Chandrasekaran, Principal, US and Global AI and Data Labs leader at KPMG states: “Shadow AI isn’t a fringe issue—it’s a signal that employees are moving faster than the systems designed to support them. Without trusted oversight and a coordinated architectural strategy, even a single shortcut can expose the organization to serious risk. But with the right guardrails in place, shadow AI can become a powerful force for innovation, agility, and long-term competitive advantage. The time to act is now—with clarity, trust, and bold forward-looking leadership.” Link: Shadow AI is already here: Take control, reduce risk, and unleash innovation It’s abundantly clear that organizations require integrated solutions to deal with the escalating risks and potential flashpoints. The “Best of Breed” approach is no longer sustainable considering the integration challenges both in cross-platform support and data ingestion charges that can arise, this is where the requirements for a modern CNAPP start to come to the forefront. The Next Era of Cloud Security report created by the IDC highlights Cloud Native Application Protection Platforms (CNAPPs) as a key investment area for organizations: “The IDC CNAPP Survey affirmed that 71% of respondents believe that over the next two years, it would be beneficial for their organization to invest in an integrated SecOps platform that includes technologies such as XDR/EDR, SIEM, CNAPP/cloud security, GenAI, and threat intelligence.” Link: The Next Era of Cloud Security: Cloud-Native Application Protection Platform and Beyond AI Security Posture Management vs Data Security Posture Management Data Security Posture Management (DSPM) is often discussed, having evolved prior to the conceptualization of Generative AI. However, DSPM is its own solution that is covered in the Blog Post Data Security Posture Management for AI. AI Security Posture Management (AI-SPM) focuses solely on the ability to monitor, assess and improve the security of AI systems, models, data and infrastructure in the environment. Microsoft’s Approach – Defender for Cloud Defender for Cloud is Microsoft’s modern Cloud Native Application Protection Platform (CNAPP), encompassing multiple cloud security solution services across both Proactive Security and Runtime Protection. However, for the purposes of this article, we will just be delving into AI Security Posture Management (AI-SPM) which is a sub feature of Cloud Security Posture Management (CSPM), both of which sit under Proactive Security solutions. Link: Microsoft Defender for Cloud Overview - Microsoft Defender for Cloud | Microsoft Learn Understanding AI Security Posture Management The following is going to attempt to “cut to the chase” on each of the four areas and cover an overview of the solution and the requirements. For detailed information on feature enablement and usage, each section includes a link to the full documentation on Microsoft Learn for further reading AI Security Posture Management AI Security Posture Management is a key component of the all-up Cloud Security Posture Management (CSPM) solution, and focuses on 4 key areas: o Generative AI Workload Discover o Vulnerability Assessment o Attack Path Analysis o Security Recommendations Generative AI Workload Discovery Overview Arguably, the principal role of AI Security Posture Management is to discover and identify Generative AI Workloads in the organization. Understanding what AI resources exist in the environment being the key to understanding their defence. Microsoft refers to this as the AI Bill-Of-Materials or AI-BOM. Bill-Of-Materials is a manufacturing term used to describe the components that go together to create a product (think door, handle, latch, hinges and screws). In the AI World this becomes application components such as data and artifacts. AI-SPM can discover Generative AI Applications across multiple supported services including: Azure OpenAI Service Microsoft foundry Azure Machine Learning Amazon Bedrock Google Vertex AI (Preview) Why no Microsoft Copilot Studio Integration? Microsoft Copilot Studio is not an external or custom AI agent service and is deeply integrated into Microsoft 365. Security posture for Microsoft Copilot Studio is handed over to Microsoft Defender for Cloud Apps and Microsoft Purview, with applications being marked as Sanctioned or Unsanctioned via the Defender for Cloud portal. For more information on Microsoft Defender for Cloud Apps see the link below. Link: App governance in Microsoft Defender for Cloud Apps and Microsoft Defender XDR - Microsoft Defender for Cloud Apps | Microsoft Learn Requirements An active Azure Subscription with Microsoft Defender for Cloud. Cloud Security Posture Management (CSPM) Enabled Have at least one environment with an AI supported workload. Link: Discover generative AI workloads - Microsoft Defender for Cloud | Microsoft Learn Vulnerability Assessment Once you have a clear overview of which AI resources exist in your environment, Vulnerability Assessment in AI-SPM allows you to cover two main areas of consideration. The first allows for the organization to discover vulnerabilities within containers that are running generative AI images with known vulnerabilities. The second allows vulnerability discovery within Generative AI Library Dependences such as TensorFlow, PyTorch, and LangChain. Both options will align any vulnerabilities detected to known Common Vulnerabilities and Exposures (CVE) IDs via Microsoft Threat Detection. Requirements An active Azure Subscription with Microsoft Defender for Cloud. Cloud Security Posture Management (CSPM) Enabled Have at least one Azure OpenAI resource, with at least one model deployment connected to it via Azure AI Foundry portal. Link: Explore risks to pre-deployment generative AI artifacts - Microsoft Defender for Cloud | Microsoft Learn Attack Path Analysis AI-SPM hunts for potential attack paths in a multi-cloud environment, by concentrating on real, externally driven and exploitable threats rather than generic scenarios. Using a proprietary algorithm, the attack path is mapped from outside the organization, through to critical assets. The attack path analysis is used to highlight immediately, exploitable threats to the business, which attackers would be able to exploit and breach the environment. Recommendations are given to be able to resolve the detected security issues. Discovered Attack Paths are organized by risk levels, which are determined using a context-aware risk-prioritization engine that considers the risk factors of each resource. Requirements An active Azure Subscription with Microsoft Defender for Cloud. Cloud Security Posture Management (CSPM) with Agentless Scanning Enabled. Required roles and permissions: Security Reader, Security Admin, Reader, Contributor, or Owner. To view attack paths that are related to containers: You must enable agentless container posture extension in Defender CSPM or You can enable Defender for Containers, and install the relevant agents in order to view attack paths that are related to containers. This also gives you the ability to query containers data plane workloads in security explorer. Required roles and permissions: Security Reader, Security Admin, Reader, Contributor, or Owner. Link: Identify and remediate attack paths - Microsoft Defender for Cloud | Microsoft Learn Security Recommendations Microsoft Defender for Cloud evaluates all resources discovered, including AI resources, and all workloads based on both built-in and custom security standards, which are implemented across Azure subscriptions, Amazon Web Services (AWS) accounts, and Google Cloud Platform (GCP) projects. Following these assessments, security recommendations offer actionable guidance to address issues and enhance the overall security posture. Defender for Cloud utilizes an advanced dynamic engine to systematically assess risks within your environment by considering exploitation potential and possible business impacts. This engine prioritizes security recommendations according to the risk factors associated with each resource, determined by the context of the environment, including resource configuration, network connections, and existing security measures. Requirements No specific requirements are required for Security Recommendations if you have Defender for Cloud enabled in the tenant as the feature is included by default. However, you will not be able to see Risk Prioritization unless you have the Defender for CSPM plan enabled. Link: Review Security Recommendations - Microsoft Defender for Cloud | Microsoft Learn CSPM Pricing CSPM has two billing models, Foundational CSPM (Free) Defender CSPM, which has its own additional billing model. AI-SPM is only included as part of the Defender CSPM plan. Foundational CSPM Defender CSPM Cloud Availability AI Security Posture Management - Azure, AWS, GCP (Preview) Price Free $5.11/Billable resource/month Information regarding licensing in this article is provided for guidance purposes only and doesn’t provide any contractual commitment. This list and license requirements are subject to change without any prior notice. Full details can be found on the official Microsoft documentation found here, Link: Pricing - Microsoft Defender for Cloud | Microsoft Azure. Final Thoughts AI Security Posture Management can no longer be considered an optional component to security, but rather a cornerstone to any organization’s operations. The integration of Microsoft Defender for Cloud across all areas of an organization shows the true potential of a modern a CNAPP, where AI is no longer a business objective, but rather a functional business component.