In Microsoft’s Detection and Response Team, we often find ourselves using the rich data available in Office 365 to help us with our investigations. During this process there are a couple of questions we consistently stumble across:
Where can I go to find ‘x’ data? (Location)
How far back does our data go? (Availability)
Just like traditional endpoint-based data, log data in cloud services is available based on factors largely outside of the investigator’s control. As an investigator, it is our job to work with what’s available, and sometimes work a little bit of magic to make the unavailable available! To begin, there are some differences worth highlighting in data availability in cloud vs. endpoint:
Can be tampered with
Cannot be tampered with
Primarily configuration based
Primarily license based
Different per log source but often based on volume of data
Based on data location and licenses available
In addition, there is often more than one place to view a particular set of data, and each location may have different retention periods. This level of complexity is two-sided. On the plus side, it provides investigators with multiple options for data extraction, enrichment, and analysis. A drawback, however, is that it makes the barrier to entry high in terms of being able to extract the highest possible quality and fidelity of data.
In this article, we aim to provide some explanations and tips for investigators to use to be able to easily understand in any situation what data is available, and in which portal.
The big picture
Figure 1. Flow of data for each of the three main log sources in Office 365
The above image shows the flow of data for the three main log sources in Office 365 through to an end web portal, demonstrating some of the complexity of how data moves through these services. The solid lines represent the ‘default’ configuration for any tenant, while the dotted lines represent those data flows which require additional configuration or additional licensing.
Let's start unpacking this diagram by looking at the different types of source data which are flowing through Office 365.
Azure AD sign-ins
By default, this data flows to Azure AD, Microsoft 365 Defender, and, for interactive sign-ins only, the Office 365 Unified Audit Log. Additionally, data flows can be configured for the following scenarios:
Integration with Defender for Cloud Apps via the ‘Office 365 Activity’ connected app
Integration with Log Analytics, providing data visualization, querying via Kusto Query Language, and support for longer-term data retention without requiring a full SIEM/SOAR solution
Full SIEM/SOAR integration into Sentinel via the Azure Sentinel connector
Integration capabilities with any other SIEM/SOAR solution via Event Hub
Identity is the new security perimeter, and its associated data is the key to a successful investigation. By finding and following malicious sign-in activity we can understand which of the productivity services need further investigation. There are four categories of sign-ins that are important for us to consider:
Sign-ins where a user provides an authentication factor, such as a password, a response through an MFA app, a biometric factor, or some other method.
Sign-ins performed by a client on behalf of a user. These sign-ins don't require any interaction or authentication factor from the user.
Sign-ins by apps and service principals that do not involve any user. In these sign-ins, the app or service provides a credential on its own behalf to authenticate or access resources.
Sign-ins by Azure resources that have secrets managed by Azure.
Azure AD admin activity
Azure Active Directory Administrative activity is stored in the Azure Active Directory Audit Log and by default, flows through to the Azure AD portal (30 days) and to the Office 365 Unified Audit Log (90 days [E3], 1 year [E5]). This log source is critical to understanding the scope of any administrative compromise to a tenant and includes the following information:
User / Group / Device changes
Authentication method changes
Administrative role changes
Hybrid authentication changes
Additional data flows follow the same configuration and logic as Azure AD sign-ins.
Office 365 activity
One of the most critical data sources for any Office 365 investigation, this data is stored in the Unified Audit Log (UAL). The UAL contains all Office 365 data, including interactive sign-ins and Azure AD admin activity. Keep in mind that non-interactive, service principal and managed identity sign-ins do not appear in the UAL. Some key information about this treasure trove of information is:
UAL data is retained for 90 days by default, and 365 days for E5/F5/A5/G5 licensed customers or customers with the correct add-on package
Access option 1 - GUI access using the Audit Search in M365 Defender
Provides a classic audit search and a new audit search tool (launched in preview in April 2022)
Filters available are: object ID, User Principal Name (UPN), and date/time
Access option 2 - PowerShell access using the Search-UnifiedAuditLog cmdlet
Can be used for programmatic extraction of data via the Exchange Online PowerShell module
Can be filtered by object ID, free text, IP address, User Principal Name (UPN), and date/time, giving more flexibility than the GUI-based options
Only 5000 results can be returned from each search – careful use of pagination can ensure that all results are returned, and example script to assist with this can be found here
This section describes the locations and web portals that this data flows to.
Azure AD Portal
Located at https://aad.portal.azure.com and contains sign-ins, risk events and Azure AD admin activity. Data is displayed in a custom interface and can be filtered and exported as needed. The default time zone for viewing data is local time, but all exported data is shown in UTC time.
Microsoft 365 Defender Portal
Located at https://security.microsoft.com, this portal surfaces two primary interfaces for viewing log data, Advanced Hunting, and access to the Unified Audit Log via the Audit Search.
Defender for Cloud Apps
Located at https://portal.cloudappsecurity.com, this portal does not include any Office 365 data unless explicitly configured. When configured, data is stored in the Activity log and multiple alert templates exist to help detect and respond to security events in the tenant (several are enabled out of the box).
The last word
With all this log data being generated, moved, and stored in so many different locations, it is easy to become overwhelmed by both the quantity of data and the unique differences in each portal and log source. Through our work investigating dozens of cloud environments every month, DART hunters usually follow a particular priority list for accessing data for analysis.
One key consideration when deciding where to hunt is the option to create and run custom queries. To this end, support for KQL (Kusto Query Language) or integration with a SIEM/SOAR solution is incredibly useful when looking to optimize hunting.
Defender for Cloud Apps – 180 days of data is available here and this length of retention can be critical in an investigation. Data enrichment for IP addresses and other data points is also incredibly useful and the portal makes it very easy to pivot from one data point to another.
Advanced Hunting in the M365 Defender portal – We have a shorter retention period here of 30 days but DART loves Kusto Query Language and being able to create and reuse our own complex queries for specific hunting scenarios is invaluable to us. If we know that our threat actor has been active recently, Advanced Hunting is our go-to for cloud data.
Azure AD Portal – Only 30 days of data is available here, but this data source is very useful when Log Analytics integration is enabled. This gives us access to workbooks where we can hunt wide and get a broad look at the environment. As we are often working with customers for 1-3 weeks at a time, it is critical for us to get up to speed quickly and build a picture of how authentication and authorization policies are configured.
Unified Audit Log – This has the longest retention of 90 or 365 days (depending on license) but due to the 5000-item result limit it has often been difficult to extract large quantities of data from. Our go-to approach here is to use PowerShell to extract the data we need; however, we are very excited about the new Audit Searchas this, from our testing, makes large-scale data collection much simpler and more reliable.
Whichever method you use for accessing log data, our hope is that after reading this blog entry, you have a better idea of where data is generated in a tenant, where it flows, and how to access it. We have a veritable treasure trove of data available to us – let’s use that power for good!