Blog Post

Microsoft Security Community Blog

4 MIN READ

Unified labeling AIP scanner preview brings scaling out and more!

Iron Contributor

Sep 19, 2019

Since its release, the Azure Information Protection scanner has been adopted by many different types of customers. For example, some small businesses have deployed single scanners to address all their data at rest, others deployed a few machines in different locations or a few machines for the purpose of redundancy, while companies that needed to deal with petabytes of data may have deployed dozens of scanner instances, – such as internally at Microsoft, in which we deployed more than 40 scanners. Large enterprise customers faced increasing TCO, mainly driven by administration overhead and attempts to distribute the load between scanners.
Consistent feedback also came from customers adopting our unified labeling platform and moving to the Azure Information Protection Unified Labeling client. The Unified Labeling client allowed customers to use more flexible automatic rules on their endpoints but they could not leverage this flexibility on the scanner that its core functionality is discovery and labeling based on automatic rules. Customers also needed to maintain their labels and conditions in both Office 365Security and Compliance Center and in the Azure portal (in order to manage conditions used by Azure Information Protection scanner).

Unified labeling scanner is here to address scale out needs!
Finally, AIP scanner for unified labeling is here! Now you can completely move your label and policy management to O365 Security and Compliance Center and complete the migration to unified labeling platform. This allows you to use custom info types and dictionaries on the AIP scanner, tweak built-in info types, define confidence levels etc.
The Azure Information Protection scanner architecture was redesigned and in addition to adoption of MIP SDK that improves the performance of single nodes you can now group your scanners in clusters that service the same scanner profile. You no longer need to try to distribute repositories between different scanner nodes in order to achieve equal volumes scanned by every node. Now you can just set one profile and put all the repositories in the same profile (we still recommend separate profiles per geo location / data center) and add all the nodes to this profile. SQL DB, now holds core role as the orchestrator of the cluster, will take care of equal distribution of load, detect deactivated nodes, taken, for example due to maintenance or patching, and reallocate incomplete jobs to active scanner nodes. Added nodes to the profile will join current scan effort and get instructions to scan the next bunch of files. This provides simplified management and elastic growth and can help you reduce the number nodes based on volume that is needed to scan. For example you can start with 50 scanners to complete the initial scan of petabytes of data and then reduce the cluster to 5 nodes to scan subsequent newly created files in the repository.

Figure 1: Distributed scanner architecture

We also incorporated a few more new features and fixes to the new scanner to improve overall management and administration. You can now decide that all, new unlabeled and already labeled files in a specific repository are labeled with specific label. For example, you can decide that all files in a repository be labeled as “Confidential”, and scanner will apply this label on all files that have no label or have a lower label. You can also allow scanner to downgrade a label if you want.

Figure 2: Enforce Confidential\Project Samos on all files in the repository

We have added an option to use the scanner to remove labels from files in specific repository. You should just set the scanner to enforce default label “None” on the repository.
Additionally, the Azure Information Protection scanner can now identify if the current protection state of a file does not reflect the current protection policy for the label on the file, and adjust the protection state. For example if you started with classification only approach and labeled all your files as Confidential using scanner and later enabled protection on the file, now the scanner will identify this change and reapply the protection on already labeled files.

We have also improved the installation procedure. For the unified labeling scanner you should only create one Azure AD registered app and grant admin consent. You no longer need to login with the scanner account in order to complete the deployment. You can use “-onbehalf” switch of the Set-AIPAuthentication cmdlet which allows you to use service accounts that no longer need “logon locally” rights in any step of the deployment.

I encourage you to download the new preview version of the scanner, review it and share your feedback. You can find detailed instructions to deploy this new scanner version or upgrade from previous version in the updated Azure Information Protection unified labeling client administrator guide. See the new section, Installing the Azure Information Protection scanner.

Note that there are a few constraints in this version: no support for HYOK, in no support for offline policy and if you upgrade from your existing scanner the new scanner will initiate full scan of all repositories.

Updated May 11, 2021

Version 4.0

information protection and governance

microsoft information protection

Denis Mizetski

Iron Contributor

Joined December 04, 2017

View Profile

Microsoft Security Community Blog

Follow this blog board to get notified when there's new activity

15 Comments

Denis Mizetski
Iron Contributor
Mar 03, 2021
Yes, the one of the main use cases for scanner is to label file automatically per MIP policy.
Samicool
Copper Contributor
Mar 02, 2021
Hello Denis, so is it possible for the scanner to automatically label the files ?
Denis Mizetski
Iron Contributor
Oct 25, 2020
Hi MJL76

I would recommend to work with support as what you describe is not the expected behavior. Try also to test this with regular client / native labeling and see if same "incorrect" match is seen there or only on the scanner side. Share your finding with the support. They will help you to fix your settings, or if this is a bug in scanner / classification engine they will open the bug to relevant team.
MJL76
Copper Contributor
Oct 22, 2020
I was able to scan all of my on-prem content with the AIP Scanner after adding that service account to a label policy, thanks. However, I’ve encountered a bug and I’m not sure where to report it. So I thought I’d throw it out here, in case anyone has any thoughts on it before I open a support ticket with Microsoft.

The AIP Scanner did find files using the sensitive information I defined in my auto-labeling rules. However, it’s not honoring the rule conditions.

For example, I have a rule in a sensitivity label the requires both a US Social Security Number AND a value from a keywords list (e.g. SSN, Social Security, SS#, etc.) to be considered a match. The AIP Scanner, however, is matching on the first condition and second condition and both conditions. This is not what I want because I consider those first two matches to be false positives.  In other words, if the AIP Scanner finds a SSN, don’t label and encrypt it unless there’s a keyword in the file, as well.

As is, if I apply labels using the AIP Scanner, it will label and encrypt 98,000 files that shouldn’t be.  Below is a screenshot of my sensitivity label with the rules and their conditions. The second screenshot is a pivot table I made from combing the AIP Scanner results. As you can see, it’s not honoring the settings defined where BOTH conditions need to be met for the two rules:

AIP Sensitivity LabelAIP Scanner Report

Any thoughts on how I can get the AIP Scanner to process the auto-labeling rules correctly, or have it apply labels only if certain combinations of sensitive info types are discovered within a file?
Denis Mizetski
Iron Contributor
Oct 18, 2020
In order to use UL scanner you must publish at least one policy to the account that was used as delegated user for getting policy on scanner. It;s not supported to run AIP scanner with no published policy even if you use setting to detect any info type rather than using "policy only" setting.
Chris_Clark_Netrix
Iron Contributor
Oct 15, 2020
MJL76 No problem. If you want to enforce a specific label on a repository, you would have to do the same where you add the scanner account to a policy that includes the label you want to label all contents of the repository with.

I hear you on the MCAS part. I need to figure out a good way to do that as well. ***Microsoft if you are listening*** 🙂
MJL76
Copper Contributor
Oct 15, 2020
Thanks Chris, that makes sense. I added the service account to my pilot label policy and the content scan job is running great!

Now if I could only find a way to quickly specify a test site/doclib in SharePoint Online to apply a Microsoft Cloud App Security file policy to, without reviewing each folder, I'd be ecstatic. 🙂 Hopefully one day they'll include a search function so I can specify a document library name instead of manually reviewing everyone in the org. But that's not an AIP issue...

Thanks again!
Chris_Clark_Netrix
Iron Contributor
Oct 15, 2020
MJL76 You need to target the scanner service account in a label policy for the auto-label to work.
MJL76
Copper Contributor
Oct 15, 2020
Hi Denis,

I am also getting an error on my nodes stating "Error: Policy does not include any automatic labeling condition" in AIP. While I set the content scan job to only discover info types defined in a policy, I do have a label in the Office 365 Security and Compliance Center that automatically applies protection. That label is also published in a label policy. So not sure what's going on. I will note that the AIP Scanner service account is not part of that label policy published in the S&C Center. Could that be my issue?

I did stop the AIP service, delete the 'mip' folder under "C:\Users\AIP.Scanner\AppData\Local\Microsoft\MSIP\mip\MSIP.Scanner.exe" and verified it was recreated when the AIP service restarted. So it seems to be picking-up the policy. Otherwise, the only difference between my dev tenant, where targeted AIP scanning works, and the prod tenant is the difference with the label policy members. In dev I have the label policy applied to all users, while prod only has pilot users defined.

I'll also note that while my AIP scans were successful when searching for all the sensitive info types, I recently received a different error about an invalid database schema. Upgrading the client from 2.6.11 to 2.8.85 and running Update-AIPScanner all seemed to go fine, but maybe something didn't work right there. I don't need to obtain an Azure AD token for the AIP scanner service again after a UL client upgrade, do I?

And thanks for the above info. With the recent changes to AIP with the UL client, finding current and relevant info on AIP is like finding a needle in a stack of slightly older needles. 🙂 Plenty of info out there, but mostly outdated content as it references the AIP classic client and the like. And almost none of it is from people who've deployed and managed this in a production environment. So you're troubleshooting steps are a huge help!
Denis Mizetski
Iron Contributor
Sep 30, 2020
Theoretically you can use SQL express with multiple nodes, but in real prod deployments it just will not scale and SQL will become your bottleneck. SQL express is also limited in the DB size, so you will only be able to scan limited number of repos and maintain the cache of what was already scanned to avoid full rescans all the time