repositories

3 Topics

Microsoft Sentinel data lake FAQ
On September 30, 2025, Microsoft announced the general availability of the Microsoft Sentinel data lake, designed to centralize and retain massive volumes of security data in open formats like delta parquet. By decoupling storage from compute, the data lake supports flexible querying, while offering unified data management and cost-effective retention. The Sentinel data lake is a game changer for security teams, serving as the foundational layer for agentic defense, deeper security insights and graph-based enrichment. In this blog we offer answers to many of the questions we’ve heard from our customers and partners. General questions 1. What is the Microsoft Sentinel data lake? Microsoft has expanded its industry-leading SIEM solution, Microsoft Sentinel, to include a unified, security data lake, designed to help optimize costs, simplify data management, and accelerate the adoption of AI in security operations. This modern data lake serves as the foundation for the Microsoft Sentinel platform. It has a cloud-native architecture and is purpose-built for security—bringing together all security data for greater visibility, deeper security analysis and contextual awareness. It provides affordable, long-term retention, allowing organizations to maintain robust security while effectively managing budgetary requirements. 2. What are the benefits of Sentinel data lake? Microsoft Sentinel data lake is designed for flexible analytics, cost management, and deeper security insights. It centralizes security data in an open format like delta parquet for easy access. This unified view enhances threat detection, investigation, and response across hybrid and multi-cloud environments. It introduces a disaggregated storage and compute pricing model, allowing customers to store massive volumes of security data at a fraction of the cost compared to traditional SIEM solutions. It allows multiple analytics engines like Kusto, Spark, and ML to run on a single data copy, simplifying management, reducing costs, and supporting deeper security analysis. It integrates with GitHub Copilot and VS Code empowering SOC teams to automate enrichment, anomaly detection, and forensic analysis. It supports AI agents via the MCP server, allowing tools like GitHub Copilot to query and automate security tasks. The MCP Server layer brings intelligence to the data, offering Semantic Search, Query Tools, and Custom Analysis capabilities that make it easier to extract insights and automate workflows. Customers also benefit from streamlined onboarding, intuitive table management, and scalable multi-tenant support, making it ideal for MSSPs and large enterprises. The Sentinel data lake is purpose built for security workloads, ensuring that processes from ingestion to analytics meet cybersecurity requirements. 3. Is the Sentinel data lake generally available? Yes. The Sentinel data lake is generally available (GA) starting September 30, 2025. To learn more, see GA announcement blog. 4. What happens to Microsoft Sentinel SIEM? Microsoft is expanding Sentinel into an AI powered end-to-end security platform that includes SIEM and new platform capabilities - Security data lake, graph-powered analytics and MCP Server. SIEM remains a core component and will be actively developed and supported. Getting started 1. What are the prerequisites for Sentinel data lake? To get started: Connect your Sentinel workspace to Microsoft Defender prior to onboarding to Sentinel data lake. Once in the Defender experience see data lake onboarding documentation for next steps. Note: Sentinel is moving to the Microsoft Defender portal and the Sentinel Azure portal will be retired by July 2026. 2. I am a Sentinel-only customer, and not a Defender customer, can I use the Sentinel data lake? Yes. You must connect Sentinel to the Defender experience before onboarding to the Sentinel data lake. Microsoft Sentinel is generally available in the Microsoft Defender portal, with or without Microsoft Defender XDR or an E5 license. If you have created a log analytics workspace, enabled it for Sentinel and have the right Microsoft Entra roles (e.g. Global Administrator + Subscription Owner, Security Administrator + Sentinel Contributor), you can enable Sentinel in the Defender portal. For more details on how to connect Sentinel to Defender review these sources: Microsoft Sentinel in the Microsoft Defender portal 3. In what regions is Sentinel data lake available? For supported regions see: Geographical availability and data residency in Microsoft Sentinel | Microsoft Learn 4. Is there an expected release date for Microsoft Sentinel data lake in Government clouds? While the exact date is not yet finalized, we anticipate support for these clouds soon. 5. How will URBAC and Entra RBAC work together to manage the data lake given there is no centralized model? Entra RBAC will provide broad access to the data lake (URBAC maps the right permissions to specific Entra role holders: GA/SA/SO/GR/SR). URBAC will become a centralized pane for configuring non-global delegated access to the data lake. For today, you will use this for the “default data lake” workspace. In the future, this will be enabled for non-default Sentinel workspaces as well – meaning all workspaces in the data lake can be managed here for data lake RBAC requirements. Azure RBAC on the Log Analytics (LA) workspace in the data lake is respected through URBAC as well today. If you already hold a built-in role like log analytics reader, you will be able to run interactive queries over the tables in that workspace. Or, if you hold log analytics contributor, you can read and manage table data. For more details see: Roles and permissions in the Microsoft Sentinel platform | Microsoft Learn Data ingestion and storage 1. How do I ingest data into the Sentinel data lake? To ingest data into the Sentinel data lake, you can use existing Sentinel data connectors or custom connectors to bring data from Microsoft and third-party sources. Data can be ingested into the analytic tier and/or data lake tier. Data ingested into the analytics tier is automatically mirrored to the lake, while lake-only ingestion is available for select tables. Data retention is configured in table management. Note: Certain tables do not support data lake-only ingestion via either API or data connector UI. See here for more information: Custom log tables. 2. What is Microsoft’s guidance on when to use analytics tier vs. the data lake tier? Sentinel data lake offers flexible, built-in data tiering (analytics and data lake tiers) to effectively meet diverse business use cases and achieve cost optimization goals. Analytics tier: Is ideal for high-performance, real-time, end-to-end detections, enrichments, investigation and interactive dashboards. Typically, high-fidelity data from EDRs, email gateways, identity, SaaS and cloud logs, threat intelligence (TI) should be ingested into the analytics tier. Data in the analytics tier is best monitored proactively with scheduled alerts and scheduled analytics to enable security detections Data in this tier is retained at no cost for up to 90 days by default, extendable to 2 years. A copy of the data in this tier is automatically available in the data lake tier at no extra cost, ensuring a unified copy of security data for both tiers. Data lake tier: Is designed for cost-effective, long-term storage. High-volume logs like NetFlow logs, TLS/SSL certificate logs, firewall logs and proxy logs are best suited for data lake tier. Customers can use these logs for historical analysis, compliance and auditing, incident response (IR), forensics over historical data, build tenant baselines, TI matching and then promote resulting insights into the analytics tier. Customers can run full Kusto queries, Spark Notebooks and scheduled jobs over a single copy of their data in the data lake. Customers can also search, enrich and restore data from the data lake tier to the analytics tier for full analytics. For more details see documentation. 3. What does it mean that a copy of all new analytics tier data will be available in the data lake? When Sentinel data lake is enabled, a copy of all new data ingested into the analytics tier is automatically duplicated into the data lake tier. This means customers don’t need to manually configure or manage this process—every new log or telemetry added to the analytics tier becomes instantly available in the data lake. This allows security teams to run advanced analytics, historical investigations, and machine learning models on a single, unified copy of data in the lake, while still using the analytics tier for real-time SOC workflows. It’s a seamless way to support both operational and long-term use cases—without duplicating effort or cost. 4. Is there any cost for retention in the analytics tier? You will get 90 days of analytics retention free. Simply set analytics retention to 90 days or less. Total retention setting – only the mirrored portion that overlaps with the free analytics retention is free in the data lake. Retaining data in the lake beyond the analytics retention period incurs additional storage costs. See documentation for more details: Manage data tiers and retention in Microsoft Sentinel | Microsoft Learn 5. What is the guidance for Microsoft Sentinel Basic and Auxiliary Logs customers? If you previously enabled Basic or Auxiliary Logs plan in Sentinel: You can view Basic Logs in the Defender portal but manage it from the Log Analytics workspace. To manage it in the Defender portal, you must change the plan from Basic to Analytics. Existing Auxiliary Log tables will be available in the data lake tier for use once the Sentinel data lake is enabled. Prior to the availability of Sentinel data lake, Auxiliary Logs provided a long-term retention solution for Sentinel SIEM. Now once the data lake is enabled, Auxiliary Log tables will be available in the Sentinel data lake for use with the data lake experiences. Billing for Auxiliary Logs will switch to Sentinel data lake meters. Microsoft Sentinel customers are recommended to start planning their data management strategy with the data lake. While Basic and auxiliary Logs are still available, they are not being enhanced further. Please plan on onboarding your security data to the Sentinel data lake. Azure Monitor customers can continue to use Basic and Auxiliary Logs for observability scenarios. 6. What happens to customers that already have Archive logs enabled? If a customer has already configured tables for Archive retention, those settings will be inherited by the Sentinel data lake and will not change. Data in the Archive logs will continue to be accessible through Sentinel search and restore experiences. Mirrored data (in the data lake) will be accessible via lake explorer and notebook jobs. Example: If a customer has 12 months of total retention enabled on a table, 2 months after enabling ingestion into the Sentinel data lake, the customer will still have access to 12 months of archived data (through Sentinel search and restore experiences), but access to only 2 months of data in the data lake (since the data lake was enabled). Key considerations for customers that currently have Archive logs enabled: The existing archive will remain, with new data ingested into the data lake going forward; previously stored archive data will not be backfilled into the lake. Archive logs will continue to be accessible via the Search and Restore tab under Sentinel. If analytics and data lake mode are enabled on table, which is the default setting for analytics tables when Sentinel data lake is enabled, data will continue to be ingested into the Sentinel data lake and archive going forward. There will only be one retention billing meter going forward. Archive will continue to be accessible via Search and Restore. If Sentinel data lake-only mode is enabled on table, new data will be ingested only into the data lake; any data that’s not already in the Sentinel data lake won’t be migrated/backfilled. Data that was previously ingested under the archive plan will be accessible via Search and Restore. 7. What is the guidance for customers using Azure Data Explorer (ADX) alongside Microsoft Sentinel? Some customers might have set up ADX cluster to augment their Sentinel deployment. Customers can choose to continue using that setup and gradually migrate to Sentinel data lake for new data to receive the benefits of a fully managed data lake. For all new implementations it is recommended to use the Sentinel data lake. 8. What happens to the Defender XDR data after enabling Sentinel data lake? By default, Defender XDR retains threat hunting data in the XDR default tier, which includes 30 days of analytics retention, which is included in the XDR license. You can extend the table retention period for supported Defender XDR tables beyond 30 days. For more information see Manage XDR data in Microsoft Sentinel. Note: Today you can't ingest XDR tables directly to the data lake tier without ingesting into the analytics tier first. 9. Are there any special considerations for XDR tables? Yes, XDR tables are unique in that they are available for querying in advanced hunting by default for 30 days. To retain data beyond this period, an explicit change to the retention setting is required, either by extending the analytics tier retention or the total retention period. A list of XDR advanced hunting tables supported by Sentinel are documented here: Connect Microsoft Defender XDR data to Microsoft Sentinel | Microsoft Learn. KQL queries and jobs 1. Is KQL and Notebook supported over the Sentinel data lake? Yes, via the data lake KQL query experience along with a fully managed Notebook experience which enables spark-based big data analytics over a single copy of all your security data. Customers can run queries across any time range of data in their Sentinel data lake. In the future, this will be extended to enable SQL query over lake as well. 2. Why are there two different places to run KQL queries in Sentinel experience? Consolidating advanced hunting and KQL Explorer user interfaces is on the roadmap. Security analysts will benefit from unified query experience across both analytics and data lake tiers. 3. Where is the output from KQL jobs stored? KQL jobs are written into existing or new analytics tier table. 4. Is it possible to run KQL queries on multiple data lake tables? Yes, you can run KQL interactive queries and jobs using operators like join or union. 5. Can KQL queries (either interactive or via KQL jobs) join data across multiple workspaces? Yes, security teams can run multi-workspace KQL queries for broader threat correlation. Pricing and billing 1. How does a customer pay for Sentinel data lake? Sentinel data lake is a consumption-based service with disaggregated storage and compute business model. Customers continue to pay for ingestion. Customers set up billing as a part of their onboarding for storage and analytics over data in the data lake (e.g. Queries, KQL or Notebook Jobs). See Sentinel pricing page for more details. 2. What are the pricing components for Sentinel data lake? Sentinel data lake offers a flexible pricing model designed to optimize security coverage and costs. For specific meter definitions, see documentation. 3. What are the billing updates at GA? We are enabling data compression billed with a simple and uniform data compression rate of 6:1 across all data sources, applicable only to data lake storage. Starting October 1, 2025, the data storage billing begins on the first day data is stored. To support ingestion and standardization of diverse data sources, we are introducing a new Data Processing feature that applies a $0.10 per GB charge for all uncompressed data ingested into the data lake for tables configured for data lake only retention. (does not apply to tables configured for both analytic and data lake tier retention). 4. How is retention billed for tables that use data lake-only ingestion & retention? During the public preview, data lake-only tables included the first 30 days of retention at no cost. At GA, storage costs will be billed. In addition, when retention billing switches to using compressed data size (instead of ingested size), this will change, and charges will apply for the entire retention period. Because billing will be based on compressed data size, customers can expect significant savings on storage costs. 5. Does “Data processing” meter apply to analytics tier data duplicated in the data lake? No. 6. What happens to billing for customers that activate Sentinel data lake on a table with archive logs enabled? Customers will automatically be billed using the data lake storage meter. Note: This means that customers will be charged using the 6X compression rate for data lake retention. 7. How do I control my Sentinel data lake costs? Sentinel is billed based on consumption and prices vary based on usage. An important tool in managing the majority of the cost is usage of analytics “Commitment Tiers”. The data lake complements this strategy for higher-volume data like network and firewall data to reduce analytics tier costs. Use the Azure pricing calculator and the Sentinel pricing page to estimate costs and understand pricing. 8. How do I manage Sentinel data lake costs? We are introducing a new cost management experience (public preview) to help customers with cost predictability, billing transparency, and operational efficiency. These in-product reports provide customers with insights into usage trends over time, enabling them to identify cost drivers and optimize data retention and processing strategies. Customers will also be able to set usage-based alerts on specific meters to monitor and control costs. For example, you can receive alerts when query or notebook usage passes set limits, helping avoid unexpected expenses and manage budgets. See documentation to learn more. 9. If I’m an Auxiliary Logs customer, how will onboarding to the Sentinel data lake affect my billing? Once a workspace is onboarded to Sentinel data lake, all Auxiliary Logs meters will be replaced by new data lake meters. Thank you Thank you to our customers and partners for your continued trust and collaboration. Your feedback drives our innovation, and we’re excited to keep evolving Microsoft Sentinel to meet your security needs. If you have any questions, please don’t hesitate to reach out—we’re here to support you every step of the way.
chaitra_satish
Oct 13, 2025 Place Microsoft Sentinel Blog
4KViews
1like
8Comments
Azure Devops API no pagination for repo endpoint?
Hi! I am trying to access and retrieve the list of repositories from an organisation, the total count being ~5000. Up until now I have used an API call with a limitation of 50 records per page (https://dev.azure.com/{org}/_apis/git/repositories?api-version=4.1&limit=50) and it was working fine, it was returning first 50 repos, checked if there was a next page and pushed the next job until there were no more left. However today my command failed because it ran too long. When I checked this call in Postman, the limitation no longer works, all 5000 repos are being returned. No matter what query I tried (limit, $top, page-size) it always retrieved all records. Was pagination removed from this API endpoint?
Vincent2017
Mar 05, 2024 Place Azure
2KViews
1like
0Comments
Microsoft Sentinel – Continuous Threat Monitoring for GitHub New OOTB Content
On December 2021 Microsoft announced its new solution for continuous monitoring for GitHub using Microsoft Sentinel. GitHub allows you to host, manage, and control different versions of software development using Git. It is highly important to track the different activities in the company’s GitHub repository, to identify suspicious events, and to have the ability to investigate anomalies in the environment. Today we are thrilled to release a new OOTB content for the GitHub solution, additional analytic rules and a new workbook that was define from best practices. In addition to the last analytic rules, Microsoft Sentinel released additional 10 analytic rules: Repository was created - this alert is triggered each time a repository is created in the GitHub environment that is connected to the Microsoft Sentinel workspace. In addition to the repository name, we get the actor who created this repository, so there’s an option to track the repositories and who is creating them. Repository was destroyed - this alert is triggered each time a repository is destroyed in the GitHub environment. It’s critical to track the repositories being destroyed in order to verify that the users destroying repositories have the correct permissions, and these actions are not part of a malicious activity. User was added to the organization - Organization owners can make anyone a member of their organization using their GitHub Enterprise Server username or email address. In cases that we have multiple organization owners we would like to track all new members on our organization, making sure we are aware of who’s contribute to our GH repositories. User was invited to the repository – after the organization owner has added the user to the organization, there’s an option to invite the user to specific repositories. This action is important so we can track all users who are invited to the different repositories and who has access to which repository. User visibility was changed - when you create a repository, you can choose to make the repository public or private. Repositories in organizations that use GitHub Enterprise Cloud and are owned by an enterprise account can also be created with internal visibility. There are also multiple options to change the user’s visibility. This alert raises once a user visibility changes. User was blocked - when a user is blocked from your organization the user stops watching your organization's repositories, they can’t comment or do any activity on the organization. We would like to know all the different users who got blocked from our repositories and our organizations. Two factor Auth disable – this alert is highly recommended to use, security-wise the 2FA is important for validation that the users who login are the real users and not hackers or attackers. Disabling this feature should be investigated and highly not recommended. Pull request was created - this alert raises once a PR is created to the different repositories on the organization that is connected to the sentinel workspace. Pull request was merged - this alert raises once a PR is merged to the repositories on the organization that is connected to the sentinel workspace. Activities from a new country – this alert identifies different activity’s locations within the same user, so if one user operates from one location and right after from a different location, this alert will notify us that for the same users there are multiple events from multiple locations. The GitHub Audit Log workbook was modified as well and change according to best practices: First we’ve added three different tabs to the existing workbook – repositories, integration installation and more. The first tab, repositories, contains different statistics on the repositories on your GitHub workspace. We’ve added a time range filter which you can select the time frame you’d like to see the data on. for example how many repositories were created and how many repositories were renamed during the past 90 days: Another examples are how many repositories were destroyed and how many repository merge settings were changed during the past 90 days: The second tab is integration and installation, this tab contains different charts regarding the configuration of the organization on the GitHub enterprise license which is connected to the Microsoft sentinel workspace. The time range filter is also affects the charts in this tab: The third tab contains different analytics regarding the authentication and authorization of the different repositories – how many repositories dependency graphs were enabled and how many repository vulnerabilities alerts were enabled during the past 90 days (this number is configurable): For more information, see: Microsoft Sentinel for GitHub - blog Microsoft Sentinel - Continuous Threat Monitoring for GitHub Solution
KobyMymon
May 10, 2022 Place Microsoft Sentinel Blog
4.6KViews
1like
0Comments