Microsoft Secure Tech Accelerator
Apr 03 2024, 07:00 AM - 11:00 AM (PDT)
Microsoft Tech Community
Implementing Microsoft Exact Data Match (EDM) Part 1
Published Apr 28 2020 01:48 PM 30.2K Views
Microsoft

Microsoft launched the Exact Data Match (EDM) feature in August of 2019. This new capability enhances an organization’s ability to identify and accurately target specific data. EDM goes beyond just checking for data that matches patterns, it creates a datastore or dictionary of actual corporate data like employee information or customer specific information to ensure the data is not sent via email or shared out to external users.

EDM can help reduce probably one of the biggest issues with Data Loss Prevention (DLP) - false positives. A false positive for DLP is when data is treated as Sensitive to the company, but really is not. Microsoft has over 99 built-in sensitive information types, but most of these types rely on pattern matching using regular expressions (regex) sequences that define a search pattern. Even pattern matching with regex is hard to define. Let’s look at Social Security Number (SSN).

An SSN is a 9-digit number that is assigned to each worker within the United States. The SSN is used to identify and track a person’s wages or self-employment earnings and is then used to monitor your Social Security Benefits when they begin. With everyone having an SSN it would seem very easy to define what it is – a 9-digit number. However, an SSN is pretty hard to identify. There are many ways people write out their SSNs, but the most common ones are the following: 123456789 or 123-45-6789 or 123 45 6789. Prior to 2011 there was a strong formatting that set certain parts of the number mush fall within specific ranges. SSNs issued after 2011 do not have the strong formatting. Many ways to identify an SSN is by looking for the three ways the SSN could be formatted as well as including keywords, like SSN, Social Security, Soc Sec, SSN#, etc.

 

 

SSN SIT.png

 

Above you can see the built-in SSN Sensitive Information Type. Microsoft has published “What the sensitive information types look for” and here is the specific link for the SSN type.

Note: You can use PowerShell to review the Rule Pack for the built-in Sensitive Info Types and then use them to customize built-in sensitive information type using these instructions.

 

With EDM, a healthcare company can now securely upload a datastore containing all of its patient’s names, addresses, MRN (Medical Record Number), SSN, etc. When an internal user goes to share out a file that’s located on their OneDrive for Business (OD4B) or sends an email of a document containing patient information, the Microsoft DLP service will scan the document and it can prevent the document from being shared or emailed outside the organization. EDM ensures this by enabling the DLP service to look for specific SSN of the customers or patients instead of looking for a number that looks like an SSN.

Let’s get going with implementing EDM. For this I decided to use superheroes and their hidden identities. We all work at the Superhero Identity Protection Agency (SIPA) and at SIPA, our number one goal is the protection of the secret identity of the world’s superheroes. We have a database that contains everything you could want to know about a superhero. To create our EDM Datastore we’ll export data from the database.

Here is the CSV file that we’ll use as the basis for our EDM Datastore.

 

SRN

Firstname

Lastname

Nickname

Home

95101

Clark

Kent

Superman

Krypton

95102

Diana

Prince

Wonder Woman

Paradise Island

95103

Bruce

Banner

Hulk

Ohio

95104

Tony

Stark

Iron Man

Los Angeles

95105

Peter

Parker

Spiderman

New York

95106

Thor

Odinson

Thor

Asgard

95107

Natasha

Romanoff

Black Widow

Moscow

95108

Steve

Rogers

Captain America

New York

95109

Bruce

Wayne

Batman

Gotham

95110

Wade

Wilson

Deadpool

New York

95111

Arthur

Curry

Aquaman

Atlantis

95112

Barry

Allen

Flash

Central City

95113

Hal

Jordan

Green Lantern

Coast City

95114

Carol

Danvers

Captain Marvel

Los Angeles

95115

Clint

Barton

Hawkeye

Classified

95116

Bobby

Drake

Iceman

New York

95117

Scott

Summers

Cyclops

Alaska

95118

Ororo

Munroe

Storm

Kenya

95119

T'Challa

 

Black Panther

Wakanda

95120

James

Howlett

Wolverine

Canada

95121

Charles

Xavier

Professor X

New York

 

In the table above, you can see how the data looks. Notice that we have a header row. The Superhero Registration Number (SRN) is used to identify each superhero. We also exported their first and last names, Superhero (Nickname) name and their Home origin.

The documentation to create Custom Sensitive Information Types with EDM is located here. I highly recommend you reference this document as it is very informative and will be kept up to date. The first step we need to do is define the Schema for our EDM Datastore. To do this we utilize XML and the CSV file we exported from our SIPA database.

A sample Schema is in the documentation. For our Schema, we first need to determine what fields we want to be searchable. The searchable fields are the key fields that we want to utilize that are critical for identification. How you configure your Schema is up to you, but for SIPA we have determined that the SRN and Nickname fields are the fields we want to be searchable.

Note: Searchable fields should be unique to the datastore, or as unique as possible. We know SRN is never duplicated in the Superhero Database so that is why it is chosen. We also know there is only one Superman, one Black Widow, one Wolverine, etc., so that is why we choose it as a searchable field. It does not make sense, at least in this instance to use something like Firstname as a searchable field. While in the sample CSV we do not have any duplicate first names, when you begin to think about documents and artifacts being using within SIPA, someone could mention Steve and be addressing Steve Jones in Database management and not Steve Rogers, Captain America.

Now that we’ve identified the searchable fields, all we need to do is create the XML file.

RP1.png

Let’s go over the XML file. I highlighted the second row above as it is important. Notice the ‘DataStore name=”SIPAIdentities”’ entry, this is important as it reflects the name of the datastore it applies to. The field names were all taken from the header row of the CSV file. You can also see that I set the “SRN” and “Nickname” fields as searchable. I have named the Schema file, edm.xml.

Now that we have the Schema file ready, we need to upload it into the service. Currently this is done via PowerShell, Microsoft will be creating GUI interfaces in the near future. Here are instructions for connecting with PowerShell. If you have Multi-factor Authentication (MFA) enabled (you should ALL HAVE MULTI-FACTOR ENABLED for Admin accounts) here are the PowerShell connection instructions with MFA Enabled. By the way, I’m using a Demo tenant for this setup, and highly recommend you first test things out in a Demo Tenant prior to enabling in your production tenant. I’m also using a Global Admin account for this, you can check out the permission structure for the Security and Compliance centers here.

Connecting and uploading Schema file:

 

1. Launch the Microsoft Exchange Online PowerShell Module that you downloaded from the instructions above.

 

2. Type Connect-IPPSSession, then enter your username in the account sign in screen, click next

 

connect1.png

3. Enter your password and click Sign in

 

connect2.png

 

4. Enter your MFA Code – this will depend on how you have, or have not, configured MFA, click Verify

Connect3.png

 

5. Now you’re connected to the Security and Compliance Center Remote PowerShell

Connect4.png

 

6. Change the directory to the location you saved your edm.xml file

connect5.png

 

7. Enter the following commands to upload the Schema file:

         $edmSchemaXml=Get-Content .\\edm.xml -Encoding Byte -ReadCount 0

         New-DlpEdmSchema -FileData $edmSchemaXml -Confirm:$true

         Confirm the action

uploads1.png

 

8. You now have a datastore Schema uploaded and ready.

 

This will wrap up part 1. We now understand more about EDM and why it’s helpful. We have begun the journey to getting EDM setup and protecting those who protect us, the Superheroes! Please check in for Part 2  of this journey as we will continue to learn more about EDM, DLP and the superheroes, as well and get the EDM configuration wrapped up!

 

 

7 Comments
Microsoft

Great write up! 

Gold Contributor

What licensing does this require?

Microsoft
Copper Contributor

Great guide (y) 

Copper Contributor

Can anyone help with the URLs needed for whitelisting for EDM to work properly.

Brass Contributor

Hi @Sean McNeill was there a Part 2 to this please?

Version history
Last update:
‎May 11 2021 03:14 PM
Updated by: