repositories
3 TopicsMicrosoft Sentinel data lake FAQ
Microsoft Sentinel data lake (generally available) is a purpose‑built, cloud‑native security data lake. It centralizes all security data in an open format, serving as the foundation for agentic defense, enhanced security insights, and graph-based enrichment. It offers cost‑effective ingestion, long‑term retention, and advanced analytics. In this blog we offer answers to many of the questions we’ve heard from our customers and partners. General questions What is the Microsoft Sentinel data lake? Microsoft has expanded its industry-leading SIEM solution, Microsoft Sentinel, to include a unified, security data lake, designed to help optimize costs, simplify data management, and accelerate the adoption of AI in security operations. This modern data lake serves as the foundation for the Microsoft Sentinel platform. It has a cloud-native architecture and is purpose-built for security—bringing together all security data for greater visibility, deeper security analysis, contextual awareness and agentic defense. It provides affordable, long-term retention, allowing organizations to maintain robust security while effectively managing budgetary requirements. What are the benefits of Sentinel data lake? Microsoft Sentinel data lake is purpose built for security offering flexible analytics, cost management, and deeper security insights. Sentinel data lake: Centralizes security data delta parquet and open format for easy access. This unified data foundation accelerates threat detection, investigation, and response across hybrid and multi-cloud environments. Enables data federation by allowing customers to access data in external sources like Microsoft Fabric, ADLS and Databricks from the data lake. Federated data appears alongside native Sentinel data, enabling correlated hunting, investigation, and custom graph analysis across a broader digital estate. Offers a disaggregated storage and compute pricing model, allowing customers to store massive volumes of security data at a fraction of the cost compared to traditional SIEM solutions. Allows multiple analytics engines like Kusto, Spark, and ML to run on a single data copy, simplifying management, reducing costs, and supporting deeper security analysis. Integrates with GitHub Copilot and VS Code empowering SOC teams to automate enrichment, anomaly detection, and forensic analysis. Supports AI agents via the MCP server, allowing tools like GitHub Copilot to query and automate security tasks. The MCP Server layer brings intelligence to the data, offering Semantic Search, Query Tools, and Custom Analysis capabilities that make it easier to extract insights and automate workflows. Provides streamlined onboarding, intuitive table management, and scalable multi-tenant support, making it ideal for MSSPs and large enterprises. The Sentinel data lake is designed for security workloads, ensuring that processes from ingestion to analytics meet evolving cybersecurity requirements. Is Microsoft Sentinel SIEM going away? No. Microsoft is expanding Sentinel into an AI powered end-to-end security platform that includes SIEM and new platform capabilities - Security data lake, graph-powered analytics and MCP Server. SIEM remains a core component and will be actively developed and supported. Getting started What are the prerequisites for Sentinel data lake? To get started: Connect your Sentinel workspace to Microsoft Defender prior to onboarding to Sentinel data lake. Once in the Defender experience see data lake onboarding documentation for next steps. Note: Sentinel is moving to the Microsoft Defender portal and the Sentinel Azure portal will be retired by March 31, 2027. I am a Sentinel-only customer, and not a Defender customer. Can I use the Sentinel data lake? Yes. You must connect Sentinel to the Defender experience before onboarding to the Sentinel data lake. Microsoft Sentinel is generally available in the Microsoft Defender portal, with or without Microsoft Defender XDR or an E5 license. If you have created a log analytics workspace, enabled it for Sentinel and have the right Microsoft Entra roles (e.g. Global Administrator + Subscription Owner, Security Administrator + Sentinel Contributor), you can enable Sentinel in the Defender portal. For more details on how to connect Sentinel to Defender review these sources: Microsoft Sentinel in the Microsoft Defender portal In what regions is Sentinel data lake available? For supported regions see: Geographical availability and data residency in Microsoft Sentinel | Azure Docs. Is there an expected release date for Microsoft Sentinel data lake in GCC, GCC-H, and DoD? While the exact date is not yet finalized, we plan to expand Sentinel data lake to the US Government environments. . How will URBAC and Entra RBAC work together to manage the data lake given there is no centralized model? Entra RBAC will provide broad access to the data lake (URBAC maps the right permissions to specific Entra role holders: GA/SA/SO/GR/SR). URBAC will become a centralized pane for configuring non-global delegated access to the data lake. For today, you will use this for the “default data lake” workspace. In the future, this will be enabled for non-default Sentinel workspaces as well – meaning all workspaces in the data lake can be managed here for data lake RBAC requirements. Azure RBAC on the Log Analytics (LA) workspace in the data lake is respected through URBAC as well today. If you already hold a built-in role like log analytics reader, you will be able to run interactive queries over the tables in that workspace. Or, if you hold log analytics contributor, you can read and manage table data. For more details see: Roles and permissions in the Microsoft Sentinel platform | Microsoft Learn Data ingestion and storage How do I ingest data into the Sentinel data lake? To ingest data into the Sentinel data lake, you can use existing Sentinel data connectors or custom connectors to bring data from Microsoft and third-party sources. Data can be ingested into the analytics tier or the data lake tier. Data ingested into the analytics tier is automatically mirrored to the lake (at no additional cost). Alternatively, data that is not needed in the analytics tier can be ingested directly into the data lake. Data retention is configured directly in table management, for both analytics retention and data lake storage. Note: Certain tables do not support data lake-only ingestion via either API or data connector UI. See here for more information: Custom log tables. What is Microsoft’s guidance on when to use analytics tier vs. the data lake tier? Sentinel data lake offers flexible, built-in data tiering (analytics and data lake tiers) to effectively meet diverse business use cases and achieve cost optimization goals. Analytics tier: Is ideal for high-performance, real-time, end-to-end detections, enrichments, investigation and interactive dashboards. Typically, high-fidelity data from EDRs, email gateways, identity, SaaS and cloud logs, threat intelligence (TI) should be ingested into the analytics tier. Data in the analytics tier is best monitored proactively with scheduled alerts and scheduled analytics to enable security detections Data in this tier is retained at no cost for up to 90 days by default, extendable to 2 years. A copy of the data in this tier is automatically available in the data lake tier at no extra cost, ensuring a unified copy of security data for both tiers. Data lake tier: Is designed for cost-effective, long-term storage. High-volume logs like NetFlow logs, TLS/SSL certificate logs, firewall logs and proxy logs are best suited for data lake tier. Customers can use these logs for historical analysis, compliance and auditing, incident response (IR), forensics over historical data, build tenant baselines, TI matching and then promote resulting insights into the analytics tier. Customers can run full Kusto queries, Spark Notebooks and scheduled jobs over a single copy of their data in the data lake. Customers can also search, enrich and promote data from the data lake tier to the analytics tier for full analytics. For more details see documentation. What does it mean that a copy of all new analytics tier data will be available in the data lake? When Sentinel data lake is enabled, a copy of all new data ingested into the analytics tier is automatically duplicated into the data lake tier. This means customers don’t need to manually configure or manage this process, every new log or telemetry added to the analytics tier becomes instantly available in the data lake. This allows security teams to run advanced analytics, historical investigations, and machine learning models on a single, unified copy of data in the lake, while still using the analytics tier for real-time SOC workflows. It’s a seamless way to support both operational and long-term use cases—without duplicating effort or cost. What is the guidance for customers using data federation capability in Sentinel data lake? Starting April 1, 2026, federate data from Microsoft Fabric, ADLS, and Azure Databricks into Sentinel data lake. Use data federation when data is exploratory, infrequently accessed, or must remain at source due to governance, compliance, sovereignty, or contractual requirements. Ingest data directly into Sentinel to unlock full SIEM capabilities, always-on detections, advanced automation, and AI‑driven defense at scale. This approach lets security teams start where their data already lives — preserving governance, then progressively ingest data into Sentinel for full security value. Is there any cost for retention in the analytics tier? Analytics ingestion includes 90 days of interactive retention, at no additional cost. Simply set analytics retention to 90 days or less. Analytics retention beyond 90 days will incur a retention cost. Data can be retained longer within the data lake by using the “total retention” setting. This allows you to extend retention within the data lake for up to 12 years. While data is retained within the analytics tier, there is no charge for the mirrored data within the lake. Retaining data in the lake beyond the analytics retention period incurs additional storage costs. See documentation for more details: Manage data tiers and retention in Microsoft Sentinel | Microsoft Learn What is the guidance for Microsoft Sentinel Basic and Auxiliary Logs customers? If you previously enabled Basic or Auxiliary Logs plan in Sentinel: You can view Basic Logs in the Defender portal but manage it from the Log Analytics workspace. To manage it in the Defender portal, you must change the plan from Basic to Analytics. Once the table is transitioned to the analytics tier, if desired, it can then be transitioned to the data lake. Existing Auxiliary Log tables will be available in the data lake tier for use once the Sentinel data lake is enabled. Billing for these tables will automatically switch to the Sentinel data lake meters. Microsoft Sentinel customers are recommended to start planning their data management strategy with the data lake. While Basic and Auxiliary Logs are still available, they are not being enhanced further. Sentinel data lake offers more capabilities at a lower price point. Please plan on onboarding your security data to the Sentinel data lake. Azure Monitor customers can continue to use Basic and Auxiliary Logs for observability scenarios. What happens to customers that already have Archive logs enabled? If a customer has already configured tables for Archive retention, existing retention settings will not change and will be automatically inherited by the Sentinel data lake. All data, including existing data in archive retention will be billed using the data lake storage meter, benefiting from 6x data compression. However, the data itself will not move. Existing data in archive will continue to be accessible through Sentinel search and restore experiences: o Data will not be backfilled into the data lake. o Data will be billed using the data lake storage meter. New data ingested after enabling the data lake: o Will be automatically mirrored to the data lake and accessible through data lake explorer. o Data will be billed using the data lake storage meter. Example: If a customer has 12 months of total retention enabled on a table, 2 months after enabling ingestion into the Sentinel data lake, the customer will still have access to 10 months of archived data (through Sentinel search and restore experiences), but access to only 2 months of data in the data lake (since the data lake was enabled). Key considerations for customers that currently have Archive logs enabled: The existing archive will remain, with new data ingested into the data lake going forward; previously stored archive data will not be backfilled into the lake. Archive logs will continue to be accessible via the Search and Restore tab under Sentinel. If analytics and data lake mode are enabled on table, which is the default setting for analytics tables when Sentinel data lake is enabled, all new data will be ingested into the Sentinel data lake. There will only be one storage meter (which is data lake storage) going forward. Archive will continue to be accessible via Search and Restore. If Sentinel data lake-only mode is enabled on table, new data will be ingested only into the data lake; any data that’s not already in the Sentinel data lake won’t be migrated/backfilled. Only data that was previously ingested under the archive plan will be accessible via Search and Restore. What is the guidance for customers using Azure Data Explorer (ADX) alongside Microsoft Sentinel? Some customers might have set up ADX cluster for their DIY lake setup. Customers can choose to continue using that setup and gradually migrate to Sentinel data lake for new data that they want to manage. The lake explorer will support federation with ADX to enable the customers to migrate gradually and simplify their deployment. What happens to the Defender XDR data after enabling Sentinel data lake? By default, Defender XDR tables are available for querying in advanced hunting, with 30 days of analytics tier retention included with the XDR license. To retain data beyond this period, an explicit change to the retention setting is required, either by extending the analytics tier retention or the total retention period. You can extend the retention period of supported Defender XDR tables beyond 30 days and ingest the data into the analytics tier. For more information see Manage XDR data in Microsoft Sentinel. You can also ingest XDR data directly into the data lake tier. See here for more information. A list of XDR advanced hunting tables supported by Sentinel are documented here: Connect Microsoft Defender XDR data to Microsoft Sentinel | Microsoft Learn. KQL queries and jobs Is KQL and Notebook supported over the Sentinel data lake? Yes, via the data lake KQL query experience along with a fully managed Notebook experience which enables spark-based big data analytics over a single copy of all your security data. Customers can run queries across any time range of data in their Sentinel data lake. In the future, this will be extended to enable SQL query over lake as well. Note: Triggering a KQL job directly via an API or Logic App is not yet supported but is on the roadmap. Why are there two different places to run KQL queries in Sentinel experience? Advanced hunting queries both XDR and analytics tables, with compute cost included. Data lake explorer only queries data in the lake and incurs a separate compute cost. Consolidating advanced hunting and KQL explorer user interfaces is on the roadmap. This will provide security analysts a unified query experience across both analytics and data lake tiers. Where is the output from KQL jobs stored? KQL jobs are written into existing or new custom tables in the analytics tier. Is it possible to run KQL queries on multiple data lake tables? Yes, you can run KQL interactive queries and jobs using operators like join or union. Can KQL queries (either interactive or via KQL jobs) join data across multiple workspaces? Security teams can run multi-workspace KQL queries for broader threat correlation Pricing and billing How does a customer pay for Sentinel data lake? Billing is automatically enabled at the time of onboarding based on Azure Subscription and Resource Group selections. Customers are then charged based on the volume of data ingested, retained, and analyzed (e.g. KQL Queries and Jobs). See Sentinel pricing page for more details. 2. What are the pricing components for Sentinel data lake? Sentinel data lake offers a flexible pricing model designed to optimize security coverage and costs. At a high level, pricing is based on the volume of data ingested/processed, the volume of data retained, and the volume of data processed. For specific meter definitions, see documentation. 3. How does the business model for Sentinel SIEM change with the introduction of the data lake? There is no change to existing Sentinel analytics tier ingestion business model. Sentinel data lake has separate meters for ingestion, storage and analytics. 4. What happens to the existing Sentinel SIEM and related Azure Monitor billing meters when a customer onboards to Sentinel data lake? When a customer onboards to the Sentinel data lake, nothing changes with analytic ingestion or retention. Customers using data archive and Auxiliary Logs will automatically transition to the new data lake meters. How does data lake storage affect cost efficiency for high volume data retention? Sentinel data lake offers cost-effective, long-term storage with uniform data compression of 6:1 across all data sources, applicable only to data lake storage. Example: For 600GB of data stored, you are only billed for 100GB compressed data. This approach allows organizations to retain greater volumes of security data over extended periods cost-effectively, thereby reducing security risks without compromising their overall security posture. here How “Data Processing” billed? To support the ingestion and standardization of diverse data sources, the Data Processing feature applies a $0.10 per GB (US East) charge for all data ingested into the data lake. This feature enables a broad array of transformations like redaction, splitting, filtering and normalization. The data processing charge is applied per GB of uncompressed data Note: For regional pricing, please refer to the “Data processing” meter within the Microsoft Sentinel Pricing official documentation. Does “Data processing” meter apply to analytics tier data mirrored in the data lake? No. Data processing charge will not be applied to mirrored data. Data mirrored from the analytic tier is not subject to either data ingestion or processing charges. How is retention billed for tables that use data lake-only ingestion & retention? Sentinel data lake decouples ingestion, storage, and analytics meters. Customers have the flexibility to pay based on how data is retained and used. For tables that use data lake‑only ingestion, there is no included free retention—unlike the analytics tier, which includes 90 days of analytics retention. Retention charges begin immediately once data is stored in the data lake. Data lake storage billing is based on compressed data size rather than raw ingested volume, which significantly reduces storage costs and delivers lower overall retention spend for customers. Does data federation incur charges? Data federation does not generate any ingestion or storage fees in Sentinel data lake. Customers are billed only when they run analytics or queries on federated data, with charges based on Sentinel data lake compute and analytics meters. This means customers pay solely for actual data usage, not mere connectivity. How do I understand Sentinel data lake costs? Sentinel data lake costs driven by three primary factors: how much data is ingested, how long that data is retained, and how the data is used. Customers can flexibly choose to ingest data into the analytics tier or data lake tier, and these architectural choices directly impact cost. For example, data can be ingested into the analytics tier—where commitment tiers help optimize costs for high data volumes—or ingested data directly into the Sentinel data lake for lower‑cost ingestion, storage, and on‑demand analysis. Customers are encouraged to work with their Microsoft account team to obtain an accurate cost estimate tailored to their environment. See Sentinel pricing page to understand Sentinel pricing. How do I manage Sentinel data lake costs? Built-in cost management experiences help customers with cost predictability, billing transparency, and operational efficiency. Reports provide customers with insights into usage trends over time, enabling them to identify cost drivers and optimize data retention and processing strategies. Set usage-based alerts on specific meters to monitor and control costs. For example, receive alerts when query or notebook usage passes set limits, helping avoid unexpected expenses and manage budgets. See our Sentinel cost management documentation to learn more. If I’m an Auxiliary Logs customer, how will onboarding to the Sentinel data lake affect my billing? Once a workspace is onboarded to Sentinel data lake, all Auxiliary Logs meters will be replaced by new data lake meters. Do we charge for data lake ingestion and storage for graph experiences? Microsoft Sentinel graph-based experiences are included as part of the existing Defender and Purview licenses. However, Sentinel graph requires Sentinel data lake and specific data sources to build the underlying graph. Enabling these data sources will incur ingestion and data lake storage costs. Note: For Sentinel SIEM customers, most required data sources are free for analytics ingestion. Non-entitled sources such as Microsoft Entra ID logs will incur ingestion and data lake storage costs. How is Entra asset data and ARG data billed? Data lake ingestion charges of $0.05 per GB (US EAST) will apply to Entra asset data and ARG data. Note: This was previously not billed during public preview and is billed since data lake GA. To learn more, see: https://learn.microsoft.com/azure/sentinel/datalake/enable-data-connectors When a customer activates Sentinel data lake, what happens to tables with archive logs enabled? To simplify billing, once the data lake is enabled, all archive data will be billed using the data lake storage meter. This provides consistent long-term retention billing and includes automatic 6x data compression. For most customers, this change results in lower long‑term retention costs. However, customers who previously had discounted archive retention pricing will not automatically receive the same discounts on the new data lake storage meters. In these cases, customers should engage their Microsoft account team to review pricing implications before enabling the Sentinel data lake. Thank you Thank you to our customers and partners for your continued trust and collaboration. Your feedback drives our innovation, and we’re excited to keep evolving Microsoft Sentinel to meet your security needs. If you have any questions, please don’t hesitate to reach out—we’re here to support you every step of the way. Learn more: Get started with Sentinel data lake today: https://aka.ms/Get_started/Sentinel_datalake Microsoft Sentinel AI-ready platform: https://aka.ms/Microsoft_Sentinel Sentinel data lake videos: https://aka.ms/Sentineldatalake_videos Latest innovations and updates on Sentinel: https://aka.ms/msftsentinelblog Sentinel pricing page: https://aka.ms/MicrosoftSentinel_Pricing4.9KViews1like8CommentsMicrosoft Sentinel – Continuous Threat Monitoring for GitHub New OOTB Content
On December 2021 Microsoft announced its new solution for continuous monitoring for GitHub using Microsoft Sentinel. GitHub allows you to host, manage, and control different versions of software development using Git. It is highly important to track the different activities in the company’s GitHub repository, to identify suspicious events, and to have the ability to investigate anomalies in the environment. Today we are thrilled to release a new OOTB content for the GitHub solution, additional analytic rules and a new workbook that was define from best practices. In addition to the last analytic rules, Microsoft Sentinel released additional 10 analytic rules: Repository was created - this alert is triggered each time a repository is created in the GitHub environment that is connected to the Microsoft Sentinel workspace. In addition to the repository name, we get the actor who created this repository, so there’s an option to track the repositories and who is creating them. Repository was destroyed - this alert is triggered each time a repository is destroyed in the GitHub environment. It’s critical to track the repositories being destroyed in order to verify that the users destroying repositories have the correct permissions, and these actions are not part of a malicious activity. User was added to the organization - Organization owners can make anyone a member of their organization using their GitHub Enterprise Server username or email address. In cases that we have multiple organization owners we would like to track all new members on our organization, making sure we are aware of who’s contribute to our GH repositories. User was invited to the repository – after the organization owner has added the user to the organization, there’s an option to invite the user to specific repositories. This action is important so we can track all users who are invited to the different repositories and who has access to which repository. User visibility was changed - when you create a repository, you can choose to make the repository public or private. Repositories in organizations that use GitHub Enterprise Cloud and are owned by an enterprise account can also be created with internal visibility. There are also multiple options to change the user’s visibility. This alert raises once a user visibility changes. User was blocked - when a user is blocked from your organization the user stops watching your organization's repositories, they can’t comment or do any activity on the organization. We would like to know all the different users who got blocked from our repositories and our organizations. Two factor Auth disable – this alert is highly recommended to use, security-wise the 2FA is important for validation that the users who login are the real users and not hackers or attackers. Disabling this feature should be investigated and highly not recommended. Pull request was created - this alert raises once a PR is created to the different repositories on the organization that is connected to the sentinel workspace. Pull request was merged - this alert raises once a PR is merged to the repositories on the organization that is connected to the sentinel workspace. Activities from a new country – this alert identifies different activity’s locations within the same user, so if one user operates from one location and right after from a different location, this alert will notify us that for the same users there are multiple events from multiple locations. The GitHub Audit Log workbook was modified as well and change according to best practices: First we’ve added three different tabs to the existing workbook – repositories, integration installation and more. The first tab, repositories, contains different statistics on the repositories on your GitHub workspace. We’ve added a time range filter which you can select the time frame you’d like to see the data on. for example how many repositories were created and how many repositories were renamed during the past 90 days: Another examples are how many repositories were destroyed and how many repository merge settings were changed during the past 90 days: The second tab is integration and installation, this tab contains different charts regarding the configuration of the organization on the GitHub enterprise license which is connected to the Microsoft sentinel workspace. The time range filter is also affects the charts in this tab: The third tab contains different analytics regarding the authentication and authorization of the different repositories – how many repositories dependency graphs were enabled and how many repository vulnerabilities alerts were enabled during the past 90 days (this number is configurable): For more information, see: Microsoft Sentinel for GitHub - blog Microsoft Sentinel - Continuous Threat Monitoring for GitHub Solution4.6KViews1like0CommentsAzure Devops API no pagination for repo endpoint?
Hi! I am trying to access and retrieve the list of repositories from an organisation, the total count being ~5000. Up until now I have used an API call with a limitation of 50 records per page (https://dev.azure.com/{org}/_apis/git/repositories?api-version=4.1&limit=50) and it was working fine, it was returning first 50 repos, checked if there was a next page and pushed the next job until there were no more left. However today my command failed because it ran too long. When I checked this call in Postman, the limitation no longer works, all 5000 repos are being returned. No matter what query I tried (limit, $top, page-size) it always retrieved all records. Was pagination removed from this API endpoint?2KViews1like0Comments