Microsoft Secure Tech Accelerator
Apr 03 2024, 07:00 AM - 11:00 AM (PDT)
Microsoft Tech Community
Unified labeling AIP scanner preview brings scaling out and more!
Published Sep 18 2019 11:54 PM 11.6K Views
Microsoft

Since its release, the Azure Information Protection scanner has been adopted by many different types of customers. For example, some small businesses have deployed single scanners to address all their data at rest, others deployed a few machines in different locations or a few machines for the purpose of redundancy, while companies that needed to deal with petabytes of data may have deployed dozens of scanner instances, – such as internally at Microsoft, in which we deployed more than 40 scanners. Large enterprise customers faced increasing TCO, mainly driven by administration overhead and attempts to distribute the load between scanners.
Consistent feedback also came from customers adopting our unified labeling platform and moving to the Azure Information Protection Unified Labeling client. The Unified Labeling client allowed customers to use more flexible automatic rules on their endpoints but they could not leverage this flexibility on the scanner that its core functionality is discovery and labeling based on automatic rules. Customers also needed to maintain their labels and conditions in both Office 365Security and Compliance Center and in the Azure portal (in order to manage conditions used by Azure Information Protection scanner).


Unified labeling scanner is here to address scale out needs!
Finally, AIP scanner for unified labeling is here! Now you can completely move your label and policy management to O365 Security and Compliance Center and complete the migration to unified labeling platform. This allows you to use custom info types and dictionaries on the AIP scanner, tweak built-in info types, define confidence levels etc.
The Azure Information Protection scanner architecture was redesigned and in addition to adoption of MIP SDK that improves the performance of single nodes you can now group your scanners in clusters that service the same scanner profile. You no longer need to try to distribute repositories between different scanner nodes in order to achieve equal volumes scanned by every node. Now you can just set one profile and put all the repositories in the same profile (we still recommend separate profiles per geo location / data center) and add all the nodes to this profile. SQL DB, now holds core role as the orchestrator of the cluster, will take care of equal distribution of load, detect deactivated nodes, taken, for example due to maintenance or patching, and reallocate incomplete jobs to active scanner nodes. Added nodes to the profile will join current scan effort and get instructions to scan the next bunch of files. This provides simplified management and elastic growth and can help you reduce the number nodes based on volume that is needed to scan. For example you can start with 50 scanners to complete the initial scan of petabytes of data and then reduce the cluster to 5 nodes to scan subsequent newly created files in the repository.


scanner diagram v4.jpg
Figure 1: Distributed scanner architecture


We also incorporated a few more new features and fixes to the new scanner to improve overall management and administration. You can now decide that all, new unlabeled and already labeled files in a specific repository are labeled with specific label. For example, you can decide that all files in a repository be labeled as “Confidential”, and scanner will apply this label on all files that have no label or have a lower label. You can also allow scanner to downgrade a label if you want.

 

f2.png

 


Figure 2: Enforce Confidential\Project Samos on all files in the repository


We have added an option to use the scanner to remove labels from files in specific repository. You should just set the scanner to enforce default label “None” on the repository.
Additionally, the Azure Information Protection scanner can now identify if the current protection state of a file does not reflect the current protection policy for the label on the file, and adjust the protection state. For example if you started with classification only approach and labeled all your files as Confidential using scanner and later enabled protection on the file, now the scanner will identify this change and reapply the protection on already labeled files.


We have also improved the installation procedure. For the unified labeling scanner you should only create one Azure AD registered app and grant admin consent. You no longer need to login with the scanner account in order to complete the deployment. You can use “-onbehalf” switch of the Set-AIPAuthentication cmdlet which allows you to use service accounts that no longer need “logon locally” rights in any step of the deployment.


I encourage you to download the new preview version of the scanner, review it and share your feedback. You can find detailed instructions to deploy this new scanner version or upgrade from previous version in the updated Azure Information Protection unified labeling client administrator guide. See the new section, Installing the Azure Information Protection scanner.


Note that there are a few constraints in this version: no support for HYOK, in no support for offline policy and if you upgrade from your existing scanner the new scanner will initiate full scan of all repositories.

15 Comments
Steel Contributor
"You can find detailed instructions to deploy this new scanner version or upgrade from previous version in the updated Azure Information Protection unified labeling client administrator guide. See the new section, Installing the Azure Information Protection scanner." As of 10/14/2019, the client admin guide does *not* include the new section "Installing the Azure Information Protection scanner" https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-instal... Whereas I did find updated AIP Scanner documentation in this page instead: https://docs.microsoft.com/en-us/azure/information-protection/deploy-aip-scanner
Copper Contributor

Hello,

I'm trying to setup this scanner and the installation went smoothly.

The node is automatically added on Azure Portal, but i get this error :

Error : Policy does not include any labeling condition

 

I can click the "Scan now" Button but it doesnt do anything.

On the server i can the logs : MSIP.Scanner (6736) Failed to validate policy and confiuguration

 

I do have an AD Account synced on Azure AD, and the server has the UL Client and has internet.

 

Any ideas?

Thank you,

Clement

Microsoft

Error : Policy does not include any labeling condition can indicate one of two:

1. You did not create any automatic labeling condition in M365 S&C and did not set scanner to match all known Info types in the content scan job configuraiton.

2. You failed to acquire policy. You can check if you get mip.policies.sqlite3 under %localappdata%\Microsoft\MSIP\mip\MSIP.Scanner.exe\mip of the scanner account.

 

Please verify you completed the Set-AIPAuthentication command using the app token as explained at: https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-powers...

Microsoft

Thank you Denis! This feedback helped me find resolution to an issue I was having.

Iron Contributor

Can we use the same cluster feature with the built in SQL Express Data Base architecture, or does this require a centralized SQL Instance per cluster?  

Microsoft

Theoretically you can use SQL express with multiple nodes, but in real prod deployments it just will not scale and SQL will become your bottleneck. SQL express is also limited in the DB size, so you will only be able to scan limited number of repos and maintain the cache of what was already scanned to avoid full rescans all the time

Copper Contributor

Hi Denis, 

 

I am also getting an error on my nodes stating "Error: Policy does not include any automatic labeling condition" in AIP.  While I set the content scan job to only discover info types defined in a policy, I do have a label in the Office 365 Security and Compliance Center that automatically applies protection.  That label is also published in a label policy.  So not sure what's going on.  I will note that the AIP Scanner service account is not part of that label policy published in the S&C Center.  Could that be my issue?  

 

I did stop the AIP service, delete the 'mip' folder under "C:\Users\AIP.Scanner\AppData\Local\Microsoft\MSIP\mip\MSIP.Scanner.exe" and verified it was recreated when the AIP service restarted.  So it seems to be picking-up the policy.  Otherwise, the only difference between my dev tenant, where targeted AIP scanning works, and the prod tenant is the difference with the label policy members.  In dev I have the label policy applied to all users, while prod only has pilot users defined.  

 

I'll also note that while my AIP scans were successful when searching for all the sensitive info types, I recently received a different error about an invalid database schema.  Upgrading the client from 2.6.11 to 2.8.85 and running Update-AIPScanner all seemed to go fine, but maybe something didn't work right there.  I don't need to obtain an Azure AD token for the AIP scanner service again after a UL client upgrade, do I?  

 

And thanks for the above info.  With the recent changes to AIP with the UL client, finding current and relevant info on AIP is like finding a needle in a stack of slightly older needles.  :)  Plenty of info out there, but mostly outdated content as it references the AIP classic client and the like.  And almost none of it is from people who've deployed and managed this in a production environment.  So you're troubleshooting steps are a huge help!  

Iron Contributor

@MJL76 You need to target the scanner service account in a label policy for the auto-label to work.

Copper Contributor

Thanks Chris, that makes sense.  I added the service account to my pilot label policy and the content scan job is running great!  

 

Now if I could only find a way to quickly specify a test site/doclib in SharePoint Online to apply a Microsoft Cloud App Security file policy to, without reviewing each folder, I'd be ecstatic.  :)  Hopefully one day they'll include a search function so I can specify a document library name instead of manually reviewing everyone in the org.  But that's not an AIP issue...

 

Thanks again! 

Iron Contributor

@MJL76 No problem.  If you want to enforce a specific label on a repository, you would have to do the same where you add the scanner account to a policy that includes the label you want to label all contents of the repository with.  

 

I hear you on the MCAS part.  I need to figure out a good way to do that as well.  ***Microsoft if you are listening***  :)

Microsoft

In order to use UL scanner you must publish at least one policy to the account that was used as delegated user for getting policy on scanner. It;s not supported to run AIP scanner with no published policy even if you use setting to detect any info type rather than using "policy only" setting.

Copper Contributor

I was able to scan all of my on-prem content with the AIP Scanner after adding that service account to a label policy, thanks.  However, I’ve encountered a bug and I’m not sure where to report it.  So I thought I’d throw it out here, in case anyone has any thoughts on it before I open a support ticket with Microsoft. 

 

The AIP Scanner did find files using the sensitive information I defined in my auto-labeling rules.  However, it’s not honoring the rule conditions. 

 

For example, I have a rule in a sensitivity label the requires both a US Social Security Number AND a value from a keywords list (e.g. SSN, Social Security, SS#, etc.) to be considered a match.  The AIP Scanner, however, is matching on the first condition and second condition and both conditions.  This is not what I want because I consider those first two matches to be false positives.  In other words, if the AIP Scanner finds a SSN, don’t label and encrypt it unless there’s a keyword in the file, as well. 

 

As is, if I apply labels using the AIP Scanner, it will label and encrypt 98,000 files that shouldn’t be.  Below is a screenshot of my sensitivity label with the rules and their conditions.  The second screenshot is a pivot table I made from combing the AIP Scanner results.  As you can see, it’s not honoring the settings defined where BOTH conditions need to be met for the two rules: 

 

AIP Sensitivity LabelAIP Sensitivity LabelAIP Scanner ReportAIP Scanner Report

 

Any thoughts on how I can get the AIP Scanner to process the auto-labeling rules correctly, or have it apply labels only if certain combinations of sensitive info types are discovered within a file?  

Microsoft

Hi @MJL76 

 

I would recommend to work with support as what you describe is not the expected behavior. Try also to test this with regular client / native labeling and see if same "incorrect" match is seen there or only on the scanner side. Share your finding with the support. They will help you to fix your settings, or if this is a bug in scanner / classification engine they will open the bug to relevant team.

Copper Contributor

Hello Denis, so is it possible for the scanner to automatically label the files ?

Microsoft

Yes, the one of the main use cases for scanner is to label file automatically per MIP policy.

Version history
Last update:
‎May 11 2021 01:59 PM
Updated by: