MCAS Regex Engine

%3CLINGO-SUB%20id%3D%22lingo-sub-1929586%22%20slang%3D%22en-US%22%3EMCAS%20Regex%20Engine%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1929586%22%20slang%3D%22en-US%22%3E%3CDIV%3EMaybe%20you%20have%20a%20Quick%20answer.%20We%20are%20currently%20evaluating%20DLP%20Capabilities%20with%20MCAS.%20As%20we%20are%20now%20implementing%20Use%20Cases%2C%20we%20discovered%20that%20the%20Regex%20Engine%20from%20Microsoft%20is%20somewhat%20special.%20Me%20and%20my%20colleagues%20understand%20that%20this%20is%20a%20mass%20amount%20engine%20and%20therefore%20has%20its%20limitations%20regarding%20the%20Quantifiers.%20Now%2C%20the%20Docs%20are%20kind%20of%20clear%20but%20only%20very%20less.%26nbsp%3B%3C%2FDIV%3E%3CDIV%3E%26nbsp%3B%3C%2FDIV%3E%3CDIV%3EHow%20does%20the%20Regex%20Engine%20actually%20works%2C%20what%20are%20the%20limitations%3F%3C%2FDIV%3E%3CDIV%3EWe%20can%20investigate%20every%20single%20regex%20match%20but%20how%20do%20we%20validate%20false%20positives%20for%20a%20amount%20of%20matches%3F%20(Probability%20Score%20or%20Reducing%20the%20max.%20Matches%20per%20day)%3C%2FDIV%3E%3CDIV%3E%26nbsp%3B%3C%2FDIV%3E%3CDIV%3E%3CDIV%3E%3CP%3E%3CSPAN%3ESome%20example%20use%20cases%20from%20the%20customer%3A%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3E-%20Leveraging%20regex%20to%20look%20for%20http%20headers%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3E-%20Look%20for%20Cookies%20(e.g.%20Look%20for%20%22Set-Cookie%22)%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3E-%20Regex%20hunting%20base64%20encoded%20jwt%20id%20or%20access%20tokens%20or%20other%20custom%20tokens%20with%20various%20file%20types%20%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3E-%20pci%20data%20(can%20be%20covered%20by%20MCAS)%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3E-%20aws%20session%20token%20(SessionToken%20AND%20base64%20encoded%20data%20in%20the%20vicinity)%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%3CSPAN%3E-%20MIP%20labeled%20documents%20(%20can%20be%20covered%20by%20MCAS)%3C%2FSPAN%3E%3C%2FP%3E%3CDIV%3E%26nbsp%3B%3C%2FDIV%3E%3CDIV%3EHope%20someone%20can%20help%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-1929586%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EApp%20Connectors%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3ECloud%20App%20Security%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EData%20Protection%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
New Contributor
Maybe you have a Quick answer. We are currently evaluating DLP Capabilities with MCAS. As we are now implementing Use Cases, we discovered that the Regex Engine from Microsoft is somewhat special. Me and my colleagues understand that this is a mass amount engine and therefore has its limitations regarding the Quantifiers. Now, the Docs are kind of clear but only very less. 
 
How does the Regex Engine actually works, what are the limitations?
We can investigate every single regex match but how do we validate false positives for a amount of matches? (Probability Score or Reducing the max. Matches per day)
 

Some example use cases from the customer:

- Leveraging regex to look for http headers

- Look for Cookies (e.g. Look for "Set-Cookie")

- Regex hunting base64 encoded jwt id or access tokens or other custom tokens with various file types

- pci data (can be covered by MCAS)

- aws session token (SessionToken AND base64 encoded data in the vicinity)

- MIP labeled documents ( can be covered by MCAS)

 
Hope someone can help
0 Replies