MCAS Regex Engine

Copper Contributor
Maybe you have a Quick answer. We are currently evaluating DLP Capabilities with MCAS. As we are now implementing Use Cases, we discovered that the Regex Engine from Microsoft is somewhat special. Me and my colleagues understand that this is a mass amount engine and therefore has its limitations regarding the Quantifiers. Now, the Docs are kind of clear but only very less. 
 
How does the Regex Engine actually works, what are the limitations?
We can investigate every single regex match but how do we validate false positives for a amount of matches? (Probability Score or Reducing the max. Matches per day)
 

Some example use cases from the customer:

- Leveraging regex to look for http headers

- Look for Cookies (e.g. Look for "Set-Cookie")

- Regex hunting base64 encoded jwt id or access tokens or other custom tokens with various file types

- pci data (can be covered by MCAS)

- aws session token (SessionToken AND base64 encoded data in the vicinity)

- MIP labeled documents ( can be covered by MCAS)

 
Hope someone can help
1 Reply
This regex features are not well documented. Just found one of my team members is spending hours testing rules. Difficult to automate multiple tests, slow round trips wating for policies to be deployed and tested.
Why isn´t there any automated script to run tests on this rules?

some links:
https://github.com/MicrosoftDocs/microsoft-365-docs/issues/9347 (Limitations on ...)
https://learn.microsoft.com/en-us/defender-cloud-apps/working-with-the-regex-engine?source=recommend...
https://microsoft.github.io/ComplianceCxE/resources/files/Deck%20-%20Tips%20and%20tricks%20for%20red...
https://learn.microsoft.com/en-us/microsoft-365/compliance/sit-get-started-exact-data-match-test?vie...