The Azure Information Protection Scanner is a program designed to detect, classify, and optionally protecting documents stored on File Shares and On-Premises SharePoint servers. The overview below is from the official documentation at https://docs.microsoft.com/en-us/information-protection/deploy-use/deploy-aip-scanner. This blog post is meant to assist customers and partners with deployment of the AIP Scanner. If there is ever a conflict, the official documentation is authoritative.
The AIP Scanner runs as a service on Windows Server and lets you discover, classify, and protect files on the following data stores:
The scanner can inspect any files that Windows can index, by using iFilters that are installed on the computer. Then, to determine if the files need labeling, the scanner uses the Office 365 built-in data loss prevention (DLP) sensitivity information types and pattern detection, or Office 365 regex patterns. Because the scanner uses the Azure Information Protection client, it can classify and protect the same file types.
You can run the scanner in discovery mode only, where you use the reports to check what would happen if the files were labeled. Or, you can run the scanner to automatically apply the labels.
NOTE: The scanner does not discover and label in real time. It systematically crawls through files on data stores that you specify, and you can configure this cycle to run once, or repeatedly.
This blog post was written based on the 1.29.5 version of the AIP Scanner. Every effort will be made to update it when things change, but if you run into difficulty running any of the commands on a newer version, please use the official documentation to identify any changes.
To install the AIP Scanner in a production environment, the following items are needed:
NOTE: We have scripted the scanner installation process and it is now available at https://techcommunity.microsoft.com/t5/Azure-Information-Protection/Azure-Information-Protection-Sca.... Although these steps are still valid, the scripted method is far less prone to mistakes and much faster for deployment.
A basic installation of the AIP Scanner service is simple and straightforward.
After the install of the AIP Scanner binaries, you must authenticate with the AIP Scanner Service Account to get a token for use in automated discovery, classification, and protection.
Authentication Token:
Now that the scanner has an authentication token, we should discuss what you want to do with the AIP Scanner. We know that you want to use it to scan file shares and SharePoint sites, but some discussion needs to be had about how the scanner locates data and what the scanner will do once it finds it.
AIP Policies contain Labels and Sub-labels that allow you to classify and optionally protect data. You can assign conditions to these labels using standard Office 365 DLP templates and have those conditions be recommended or automatic. For the AIP Scanner to classify documents, you must set these conditions to be Automatic. This allows the AIP Scanner to protect content without the need for user input. This is a content based approach and labels are assigned to content based on the conditions defined in each label. If you want all of the documents in your repositories to be classified, then you can use the default label setting in the portal and the AIP Scanner will assign that label to any content that does not meet any other automatic criteria. This is done in the Global policy blade, under the Configure settings to display and apply on Information Protection end users section.
NOTE: Use caution when using a default label as this will label any file that is not caught by properly defined conditions. This could potentially result in improper classification of many documents if not tested appropriately.
For more in-depth information about configuring policies, you can see the official documentation at https://docs.microsoft.com/en-us/information-protection/deploy-use/configure-policy-classification
Repositories can be on-premises SharePoint 2013 or 2016 document libraries or lists and any accessible CIFS based share.
NOTE: In order to do discovery, classification, and protection, the scanner service pulls the documents to the server, so having the scanner server located in the same LAN as your repositories is recommended. You can deploy as many servers as you like in your domain, so putting one at each major site is probably a good idea (Microsoft currently uses around 40 Scanner instances worldwide for internal repositories and will be expanding that to 240).
One of the most useful features of the AIP Scanner is the discovery of sensitive data across all of your configured repositories. You can do this by using Set-AIPScannerConfiguration with a switch called -DiscoverInformationTypes. When this switch is set to All, the scanner will discover files that contain any data in the list of all Office 365 DLP sensitive data types, and any custom string or regex values that you have specified as automatic conditions for labels in the Azure Information Protection policy. When you use this option, labels do not need to be configured to use any conditions for the Office 365 sensitive data types, but you will need automatic conditions configured for custom string or regex values.
NOTE: The labels for the custom values can be applied to a policy scoped just to the AIP Scanner service account if you do not want them triggering on your global labels.
The PowerShell command below will allow you to scan your repositories against all information types.
Set-AIPScannerConfiguration -Enforce Off -Schedule OneTime -Type Full -DiscoverInformationTypes All
To start the discovery, use the PowerShell command below
Start-Service AIPScanner
After running the scan, you can review the logs by opening the Azure Information Protection event log or you can view the detailed logs at C:\users\<Scanner Service Account Profile>\appdata\local\Microsoft\MSIP\Scanner\Reports. There you will find the summary txt and detailed csv files.
Running this command on your defined repositories will show you all of the sensitive data types you currently have in those repositories. You can then use this information to define conditions on labels so you can properly classify and protect your content.
Once you have your conditions defined, type the PowerShell command below to enforce protection and have the scanner run once.
Set-AIPScannerConfiguration -Enforce On -Schedule OneTime -Type Full
NOTE: After testing, you would use the same command with the -Schedule Continuous command to have the AIP Scanner run continuously.
NOTE: The -Type Full switch forces the scanner to review every document.
To start the initial enforcement scan, use the PowerShell command below
Start-Service AIPScanner
You should now be able to review the event log and AIP Scanner log files to see what files have been classified and protected.
The last item you will want to do is set the scanner to continuously monitor the repositories you have defined for new content. This can be done using the PowerShell commands below.
Set-AIPScannerConfiguration -Enforce On -Schedule Continuous
Start-Service AIPScanner
You should now have a fully functional AIP Scanner instance. You can repeat this process on multiple servers as necessary and use the same Set-AIPAuthentication command for each of them. This is a simple setup for a basic scanner server that can be used to protect a large amount of data easily. I highly recommend reading the official documentation on deploying the scanner as there are some less common caveats that I have left out and they cover performance tips and other additional information.
Thanks,
The Information Protection Customer Experience Team
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.