Microsoft Secure Tech Accelerator
Apr 03 2024, 07:00 AM - 11:00 AM (PDT)
Microsoft Tech Community
Enhancements to Microsoft Exact Data Match
Published Jan 13 2021 06:31 AM 3,698 Views

Prior to most of us taking off for the holiday break, Microsoft posted a blog about new Information protection capabilities. I am going to cover a bit more in depth the capabilities for Exact Data Match (EDM) discussed in that blog post.

The first announcement related to EDM, and biggest for most EDM admins, is the general availability of a user interface in the Compliance center for management of EDM. This is a big step forward and will help in the creation and management of EDM. The PowerShell option to manage EDM is still available for use for those who love the command line. I am going to cover more of this new interface later in this blog.

The second announcement is around the scale of the EDM service. Microsoft now supports files containing up to 100 million rows, up from 10 million rows at launch. They also reported that the time needed to upload and index the data to the EDM service has been reduced by 50%. On the security side, salting is being added to the hashing process, this protects the data while being transmitted to the service as well as while stored within the service.

The last announcement related to EDM is the general availability of configurable match (normalization). This will now allow for case insensitivity, treating upper- and lower-case letters as the same. Also, you can configure if punctuation should be ignored, such as the dashes in a social security number, “123-45-6789” would be the same as “123456789” 

Let us dive deeper into the new user interface! If you read my previous blog series about Implementing Microsoft EDM, you will remember that virtually all the steps for setting up the EDM Schema and datastore were done via command line interface (mostly PowerShell). Now with the new UI (user interfaces) you can setup the Schema and Sensitive Information Types (SITs) from a graphical interface. I am going to compare the steps I completed in the blog series to how it looks today in the UI.

First off, where is the new UI? It is in the Microsoft 365 Compliance Portal, under Data classification you will see the new Exact Data Matches tab.




Above is the view from my demo tenant that I have rebuilt since the blog series, but I used the blog series to setup the EDM in this tenant. What I thought would be an effective way to show the new UI, is I am going to setup the same EDM configuration I did in the blog series, so here it goes.

In my new demo tenant, you can see I have nothing in the Exact data match area.




To begin the setup of EDM I need to get a Schema created. Below is the XML (eXtensible Markup Language) File that I used to setup the schema previously.




Let us create this in the new tenant and take advantage of the new UI and features available. Here is official Docs doc for the use of the wizards to create the schema and sensitive info types.

I selected Create EDM Schema from the page and got this. I went ahead and named the schema and gave a description.




Here is one of the new features, the ability to ignore delimiters and punctuation for the schema fields. I choose to enable and add several items to be ignored. Please note that the delimiters and punctuation ignored for indexed SITs must match the normalizations defined for that Out of Box (OOB) or custom SIT that will be referenced by the EDM SIT (more about this later in this blog). For example, US SSN (OOB SIT) is configured to detect straight-nines (e.g., 515121111), SSN with dashes (e.g., 515-12-1111) and SSN delimited by spaces (e.g., (515 12 1111). Any other delimiters and punctuation configured will be effectively ignored by the EDM service, as the underlying pattern won't be able to detect them.




New is the schema fields to setup, one cool thing is with the above setting for ignoring delimiters and punctuation it defaults for all schemas. Notice the ability to turn on per schema ignoring, would need to turn off the blanket ignore policy above. I am going to keep mine as a schema wide ignore versus per field ignore. Besides the checkmark for enabling the field as searchable, you see another new feature, case insensitive, I am turning this on for all fields!




To add additional data fields, just click on the + to add additional. I am going to add the remaining fields, duplicating what I setup during the blog posts. After entering all five fields, I saved the schema. To view the schema, select the radio button next to the name to get a review pop out.




Now I switch to the EDM sensitive info types section, to work on those.




Selecting Create EDM sensitive info type (SIT) brings me to the UI wizard. First thing I need to do is choose the data store schema that I want to use for this EDM sensitive info type.




I selected Choose an existing EDM Schema and then selected the sipaidentities that we created previously.




Clicking next brings me to the defining the patterns for the SIT. Here are the patterns I used to setup the EDM SITs. One change that you will notice is that Microsoft has switched from a percentage-based confidence level to a 3-tier rating. Since I had 3 levels already, I will just transition using the 3-tier model.




Prior to creating the EDM SITs, I need to create the normal SITs that the EDM SITs will reference. This is referenced in Blog Post 2 of the series, creating the SRN SIT and creating the Superhero-Nickname SIT. I went ahead and created both, just like I did in the blog post.




Now I will go ahead and create/define the patterns for the EDM SITs. To start this, I click + Create pattern. For the first one I am setting it to Low Confidence and selecting SRN as the primary element (we created this in the Schema).




Next I choose the primary SIT I am associating with this EDM SIT for this it is the Superhero-Registration-Number(SRN) that I created.




Next is the Supporting elements area. First I select the other 4 fields as supporting elements for this SRN SIT.




Then I need to set the matching options/conditions. Since this first one is the low confidence pattern, I choose max and minimum as 2, just like the XML sample was configured for the 75% level.




After clicking Done it returns to the previous screen with the new pattern.




I went ahead and created the medium and high confidence level patterns for SRN.




Next, I setup the recommended confidence level and character proxy. Both these settings are in the XML Sample above and in the Blog series. I Set to Low as that is the equivalent percentage level that is set in the blog series patterns.




The next step is the name the EDM SIT, this was done in another section of the rulepack.XML File.




I entered in the name and description.




I reviewed the information and selected Submit.




Success, EDM SIT was created. I am not going to Hash, Salt and Upload now, as I need to create the Nickname-Nickname-EDM SIT before doing that. I do want to point out that the hashing, salting and the upload of the data is still done via the command line just like I described in the blog post.




I think this will wrap up this blog. Hope you enjoy using the new UI and other improvements and new features of Microsoft EDM!


Version history
Last update:
‎May 11 2021 01:58 PM
Updated by: