Since its release, the Azure Information Protection scanner has been adopted by many different types of customers. For example, some small businesses have deployed single scanners to address all their data at rest, others deployed a few machines in different locations or a few machines for the purpose of redundancy, while companies that needed to deal with petabytes of data may have deployed dozens of scanner instances, – such as internally at Microsoft, in which we deployed more than 40 scanners. Large enterprise customers faced increasing TCO, mainly driven by administration overhead and attempts to distribute the load between scanners.
Consistent feedback also came from customers adopting our unified labeling platform and moving to the Azure Information Protection Unified Labeling client. The Unified Labeling client allowed customers to use more flexible automatic rules on their endpoints but they could not leverage this flexibility on the scanner that its core functionality is discovery and labeling based on automatic rules. Customers also needed to maintain their labels and conditions in both Office 365Security and Compliance Center and in the Azure portal (in order to manage conditions used by Azure Information Protection scanner).
Unified labeling scanner is here to address scale out needs!
Finally, AIP scanner for unified labeling is here! Now you can completely move your label and policy management to O365 Security and Compliance Center and complete the migration to unified labeling platform. This allows you to use custom info types and dictionaries on the AIP scanner, tweak built-in info types, define confidence levels etc.
The Azure Information Protection scanner architecture was redesigned and in addition to adoption of MIP SDK that improves the performance of single nodes you can now group your scanners in clusters that service the same scanner profile. You no longer need to try to distribute repositories between different scanner nodes in order to achieve equal volumes scanned by every node. Now you can just set one profile and put all the repositories in the same profile (we still recommend separate profiles per geo location / data center) and add all the nodes to this profile. SQL DB, now holds core role as the orchestrator of the cluster, will take care of equal distribution of load, detect deactivated nodes, taken, for example due to maintenance or patching, and reallocate incomplete jobs to active scanner nodes. Added nodes to the profile will join current scan effort and get instructions to scan the next bunch of files. This provides simplified management and elastic growth and can help you reduce the number nodes based on volume that is needed to scan. For example you can start with 50 scanners to complete the initial scan of petabytes of data and then reduce the cluster to 5 nodes to scan subsequent newly created files in the repository.
Figure 1: Distributed scanner architecture
We also incorporated a few more new features and fixes to the new scanner to improve overall management and administration. You can now decide that all, new unlabeled and already labeled files in a specific repository are labeled with specific label. For example, you can decide that all files in a repository be labeled as “Confidential”, and scanner will apply this label on all files that have no label or have a lower label. You can also allow scanner to downgrade a label if you want.
Figure 2: Enforce Confidential\Project Samos on all files in the repository
We have added an option to use the scanner to remove labels from files in specific repository. You should just set the scanner to enforce default label “None” on the repository.
Additionally, the Azure Information Protection scanner can now identify if the current protection state of a file does not reflect the current protection policy for the label on the file, and adjust the protection state. For example if you started with classification only approach and labeled all your files as Confidential using scanner and later enabled protection on the file, now the scanner will identify this change and reapply the protection on already labeled files.
We have also improved the installation procedure. For the unified labeling scanner you should only create one Azure AD registered app and grant admin consent. You no longer need to login with the scanner account in order to complete the deployment. You can use “-onbehalf” switch of the Set-AIPAuthentication cmdlet which allows you to use service accounts that no longer need “logon locally” rights in any step of the deployment.
I encourage you to download the new preview version of the scanner, review it and share your feedback. You can find detailed instructions to deploy this new scanner version or upgrade from previous version in the updated Azure Information Protection unified labeling client administrator guide. See the new section, Installing the Azure Information Protection scanner.
Note that there are a few constraints in this version: no support for HYOK, in no support for offline policy and if you upgrade from your existing scanner the new scanner will initiate full scan of all repositories.