Detecting and Remediating Impossible Travel

Microsoft

May 12, 2022

Overview

“Impossible travel” is one of the most basic anomaly detections used to indicate that a user is compromised. The logic behind impossible travel is simple. If the same user connects from two different countries and the time between those connections can’t be made through conventional air travel, it’s an impossible travel.

Although the principle of detecting impossible travel is straightforward, the how, when, and where we work has made it more challenging to correctly identify an impossible travel event. Employees can connect to your corporate network and resources practically at the same time from multiple devices, using multiple applications, from multiple IP addresses. To address these problems, Microsoft implemented a comprehensive mechanism to analyze and record user behavior, to develop suppressions that ignores cases of “legitimate” impossible travel.

For example, an employee works from a laptop from their home, but the laptop has a VPN connection to the corporate network with split tunnels such as Microsoft Office 365 going directly from the home IP address, while activity using GitHub goes out through the VPN in a different country. Simultaneously, they are connected to the Microsoft Teams app from their smartphone and the IP address can switch in a matter of seconds between the home Wi-Fi and the cellular network’s ISP.

Another example is when an employee is traveling and working from a remote location. Their smartphone can jump between their cellular network provider that’s connected using roaming from their home country and Wi-Fi located on a different continent.

Technical details

To handle such cases, we built a stream processing job that consumes, enriches, and aggregates several billion activities per day to detect suspicious actions. For each country the user was active in during the last couple of days, the visit details are stored as part of an aggregation of visit activities such as the user agents and ISPs that appeared during that visit. When a new event is triggered, we correlate the event with the stored visits to see if the activity potentially causes an impossible travel incident. This reduces the number of false positives and investigations.

Each impossible travel incident is based on two visits. Each visit represents an aggregation of user log activities during a session in a single country. The visit contains information such as user agent, IP address etc., that can be analyzed to determine if the visit was legitimate. Additionally, all the aggregated properties are saved - enabling us to compare them to the other visit. If they are similar enough, we avoid raising an unnecessary alert. An example of the aggregated properties could be the user agents that were utilized during the visit.

A flowchart graphic that goes from left to right. It shows several events happening almost simultaneously with attempts to connect to resources in the cloud. The events are analyzed by reviewing a user’s historical behavior and comparing it with a stream processing job that consumes, enriches, and aggregates several billion activities per day to detect suspicious actions. If the events appear normal based on information on the user’s historical activity, no action is taken. If the events appear suspicious, an alert for an impossible travel incident is triggered and the information is logged.

We add a visit to the visits store only when the location of the user is from a valid user location that represents a physical location. If we indicate, according to the tenant's IP ranges configuration or according to our geo-IP data that the IP doesn't represent a physical location, such as an IP used by a VPN service or a cloud provider, it will not participate in an impossible travel incident. Additionally, we avoid raising alerts on neighboring countries, while using smaller resolutions when it is required.

After an impossible travel incident is created, we classify each visit as suspicious or normal. If both visits are labeled as normal we suppress the incident to avoid triggering an alert on false positive scenarios. If a visit is labeled suspicious, an alert is raised.

To classify a visit as normal we use different methods. The algorithm ensembles two logics, one is based on the user’s learned baseline while the other is based on security attributes that we consume from the identity provider service (AAD).

For the first part, we have a profile service which learns the common properties of each user, such as location, what devices are normally active, and common activities that are performed. The user profile is updated after each new activity. If the visit shares properties that are typical for that particular user, we will likely label it as a normal visit. For roaming scenarios, we combine profile-based data with the device information knowledge, avoiding the creation of alert for these cases.

The other part of the algorithm takes signals from AAD that indicates if the activities were really made by the user, such as MFA and registered device information. Those signals are highly accurate indicators and we found integrating them into our process decreased the number of false positives dramatically.

Using these signals, we can correlate between the login made by the user to the other activities during the visit. This sheds light on the visit itself, and enables us to clear activities that otherwise would have appeared suspicious.

When we surface the alert to the Microsoft Defender for Cloud Apps portal, and soon to the Microsoft 365 Defender portal,we also look for additional insights we can share related to other activities that happened during the time that triggered the event. We access metadata and statistical enrichments, such as in the example below.

A graphic with three bullets that shows an example of the metadata that is provided in an alert in the Microsoft Defender for Cloud Apps, and soon, the Microsoft 365 Defender portals. Example, important information: This user is an administrator in Office 365 (Default). Microsoft Azure (Default) was accessed from IP address 73.42.222.55 for the first time in 180 days. Microsoft Azure (Default) was accessed from the ISP Comcast for the first time in 44 days.

Conclusion

When we look at the detections as a whole set, Microsoft has the power to move a non-focused and noisy scenario from a single detection to a significantly more accurate signal in the context of a different detection. For example, making a detection that profiles IP addresses that are involved in a password spray attempt enabled us to remove the failed logins from being considered as part of “impossible travel.” By removing failed logins from the detection’s logic, VPN suppression, and other changes, Microsoft has decreased false positives by almost 75%.

An impossible travel event is triggered only when there is a successful activity performed from an IP address where the scenario should catch the SOC’s attention.

Microsoft is committed to keep working with our engineering, data science, and security research team to build the next generation of alerting experiences. Alerts that are more precise and tuned to identify real attack scenarios. By leveraging Microsoft 365 Defender incidents concept and correlations capabilities, we will alert on correlated actions when they have meaning as part of a complete security scenario.

For more details about incidents, you can read this blog: Inside Microsoft 365 Defender: Correlating and consolidating attacks into incidents - Microsoft Security Blog

Safe travels!