Blog Post

Microsoft Security Blog

10 MIN READ

Securing Sensitive Data with the AIP Unified Labeling Scanner

Microsoft

Feb 04, 2020

NOTE: This is the most updated version of the blog posted in February 2020. This blog is based on the Unified Labeling version of the AIP scanner. 

Most modern organizations have terabytes (or petabytes) of unstructured data sitting in their on-premises data repositories and SharePoint libraries. Managing this data, the way you manage other corporate resources, is a daunting but achievable task using tools that you likely already own. In this article, we will walk you through the discovery of sensitive data and show you options to classify and protect that data.

The AIP scanner allows you to scan your on-premises data repositories against the standard Office 365 sensitive information types and custom types you build with keywords or regular expressions. Once the data is discovered, the AIP scanner(s) can aggregate the findings and display them in Analytics reports so you can begin visualizing your data risk and see recommendations for setting up protection rules based on the content.

To configure the AIP unified labeling scanner, there are a few steps you need to follow:

Configure on-premises prerequisites
- Server
- SQL
- Installer account permissions
- Local Service Account
- Open required network locations
- AIP scanner binaries
Configure Azure prerequisites
- Global admin credentials
- Cloud service account (creation or sync)
- Create Azure AD application for service authentication
- Grant authorization for Azure AD application
- Configuring AIP Azure Log Analytics (Optional)
AIP scanner node, cluster, content scan jobs and repo configuration
AIP scanner installation

Now, this may seem like a lot of things, but don't worry. We will walk you through the whole process so that it is as painless as possible. This article assumes a standard implementation. Before production deployment we recommend that you read through the official documentation at https://docs.microsoft.com/en-us/azure/information-protection/deploy-aip-scanner to ensure that you will not run into any issues and to help through any custom scenarios.

On-Premises Prerequisites

At least one Server (Physical or Virtual) capable of running the AIP unified labeling scanner
- The official specifications are listed in the docs here but at least 4 cores and 8GB RAM is required (more is highly recommended) and at least 10GB of free storage space for temporary files (again, more = better)
A SQL Server Instance to store configuration and scanned file list. SQL Server Developer Edition (free version) installed locally on the AIP scanner server is supported but for load balancing capabilities of the AIP unified labeling scanner, a full version of SQL is required. You could also use SQL Server Express Edition.
An installer account with sysadmin rights to the SQL instance and local admin rights on the Server (apologies for the overuse of bold, but these requirements are often missed)
An on-premises user account to run the AIP scanner service (e.g. Contoso\AIPScanner)
- No special rights are needed for configuration, but this account will need read rights to all configured repositories to do discovery and read/write for labeling and protection
Internet connectivity that allows the following URLs over HTTPS (port 443):
- *.aadrm.com
- *.azurerms.com
- *.informationprotection.azure.com
- informationprotection.hosting.portal.azure.net
- *.aria.microsoft.com
- *.protection.outlook.com

Installing Scanner Binaries

Installing the AIP scanner binaries is a very straight-forward process as they are included with the AIP unified labeling client. Navigate to https://aka.ms/AIPClient and click the Download button. When presented with the download options, check the box next to AzInfoProtection_ul.exe and click the Next button. The download should start automatically. Once complete, double-click on the file and run through the quick setup on the prepared AIP scanner server. Please be sure to select the most up-to-date GA UL Client.

Azure Prerequisites

Global Admin permissions for the tenant
Synchronized or created cloud service account
- This is typically done using Azure AD Sync after the on-premises service account is created. If you are not using Azure AD Sync to synchronize your on-premises service account, you will need to create an account in the cloud. This can be accomplished manually by logging into the Azure Portal, or via PowerShell script.
Service account must be added to an AIP Policy
Configure Azure Application necessary for AIP scanner authentication.

Creating the Azure AD Application Registration

We must create an Azure AD Application for AIP Authentication to allow the scanner to protect files non-interactively (you only need to run this the first time you are setting up the AIP scanner. You can use the same Set-AIPAuthentication command created at the end to authenticate multiple AIP scanner servers). The official documentation for creating these applications is found at https://docs.microsoft.com/en-us/azure/information-protection/rms-client/client-admin-guide-powershe....

For convenience, we have created a video that will walk you through how to create the AAD Application.

Next we will be obtaining admin consent necessary to run the AIP client unattended. This will be done by obtaining an Azure AD Token.

Navigate to the Azure Portal and proceed to the Azure Active Directory Blade.
In the Azure Active Directory side pane, click App Registrations.
At the top, go ahead and click + New registration.
In the Name section type in AIPScanner.
Leave Supported account types as default.
For the Redirect URI, leave the type as Web but type in http://localhost for the entry portion and click Register.
On the Overview page of this application, note down in your text editor of choice the following IDs: Application (client) ID and Directory (tenant) ID. You will need this later when setting up the Set-AIPAuthentication command.
On the side pane, navigate to Certificates and Secrets
Click on + New client secret
In the dialog box that shows up, enter a description for your secret and set it to Expire In 1 year and then Add the secret.
You should see now under the client secrets section that there is an entry with the Secret Value. Go ahead and copy this value and store it in the file where you saved the Client ID and Tenant ID. This is the only time you will be able to see the secret value, it will not be recoverable if you don't copy it at this time.
On the side pane, navigate to API Permissions
Go ahead and select Add a permission.
When the screen shows, select Azure Rights Management Service. Then select Application Permissions.
Click the drop down for Content and put checkmarks down for Content.DelegatedReader and Content.DelegatedWriter. Then at the bottom of the screen, click Add Permissions.
Navigate back the API Permissions section and add another permission.
This time, for the Select an API section, click on APIs my organization uses. In the search bar, type in Microsoft Information Protection Sync Service and select it.
Select Application Permissions and then in the Unified Policy drop down, checkmark the permission UnifiedPolicy.Tenant.Read. Then at the bottom of the screen, click Add Permissions.
Back on the API Permissions screen, click Grant Admin Consent and look for the operation being successful (signified by a green checkmark).

Configuring AIP Azure Log Analytics (Optional)

Although this step is technically optional, we recommend configuring analytics prior to running your first scan so you can begin to visualize your data risk as shown in the initial image in this article. In the AIP blade of the Azure Portal, you will see Configure analytics (preview) under the Manage section. Click on this and you should see a page like the one below.

If you already have a configured ALA Workspace for this purpose, check the box next to it and press OK. Otherwise, click the + Create new workspace link.

Fill in the items shown in the image below:

Log Analytics Workspace (must be unique across Azure)
Azure Subscription (If this is not populated, you will need to get access or have someone with access to the subscription create the workspace)
A new or existing Resource group
The Location closest to your users (usually this will be in the same geography as your tenant)
A Pricing tier (usually Per GB or Standalone. Free tier only stores logs for 7 days)
Press OK.

Finally, back in the Configure analytics (preview) blade, check the box next to the workspace and click OK.

NOTE: Checking the box next to Enable deeper analytics allows the actual matched content to be stored in the Log Analytics workspace. This could include many types of sensitive information such as PII, Credit Card Numbers, and Banking Information. This option is typically used during testing of automatic conditions and not widely used in production settings due to the sensitive nature of the collected data. If this is used in a production setting, extreme caution should be taken with securing access to this workspace.

AIP Scanner Configuration

Configuration of the AIP scanner is currently done via the Central Management User Interface in the Azure Portal. We will quickly walk through the minimum configuration elements to install a functional scanner in discover mode.

Navigate to the Azure Portal and type in Azure Information Protection in the search bar to open up the respective tab.
On the side pane, under Scanner, click Clusters
In the Clusters tab, click the + Add button.
In this Add a new cluster pane, enter East US for the Cluster name and click save.
In the side pane under the Scanner section, click on Content scan jobs and click the + Add button.
Provide a name for the Content scan job and then configure using the following settings:
The default Schedule is set to Manual, and Info types to be discovered is set to All.
Under Policy Enforcement, set the Enforce switch to Off
Click Save to complete initial configuration
Once the save is complete, click on Configure repositories
In the Repositories blade, click the + Add button
In the Repository blade, under Path, type \\AdminPC\Documents
In the Repository blade, click Save
NOTE: Keep the Azure Portal window available for future hands on sections.

Installing the AIP Scanner

We should now have all prerequisites in place to install the AIP scanner.

On the desktop, restore AdminPC
Open an Administrative PowerShell Window and type the PowerShell commands below.
Install-AIPScanner -SqlServerInstance <name> -Profile <cluster name>
For name input the machine the SQL Server Instance is running on, which in this case is "AdminPC". For Profile type in the cluster name in quotations as well.
You will be prompted to enter the local AIP scanner service account credentials in Domain\AccountName format and to provide the SQL Server instance name (This will be ServerName or ServerName\SQLExpress depending on the version you installed).
Verify that the service is now installed by using Administrative Tools > Services. The installed service is named Azure Information Protection Scanner and is configured to run by using the scanner service account that you created.

If you encounter any errors, please validate that the installer account has the permissions mentioned in the On-Premises Prerequisites and you do not have any firewall issues reaching the SQL server or Azure.

Now that you have the AIP scanner service installed, you can run the Set-AIPAuthentication command to get the non-interactive authentication token as was demonstrated on the video using the following command:

$pscreds = Get-Credential Contoso\AIPScanner
Set-AIPAuthentication -AppId "<CLIENTID>" -AppSecret "<SECRET>" -DelegatedUser aipscanner@contoso.com -TenantId "<TENANTID>" -OnBehalfOf $pscreds

For your $pscreds variable make sure to use your AD Domain name followed by the backslash with your local admin that is being used on your AdminPC machine. You will be prompted for the local account password so fill that in and hit enter.

For your -AppID parameter input the Application (Client) ID you saved in a file earlier. Be sure to include the quotation marks.

For your -AppSecret parameter input the Secret Value that you saved in a file earlier. Be sure to include the quotation marks.

For your -DelegatedUser parameter input the AAD synced or cloud-based service account you are using to manage AIP. You do not need quotation marks here.

For your -TenantID parameter input the Directory (Tenant) ID that you saved in a file earlier. Be sure to include the quotation marks.

Make sure to use $pscreds as the parameter for -OnBehalfOf.

Run the command and if successful, you will receive the following message "Acquired application access token on behalf of Contoso\AIPScanner."

You are now ready to run the scanner!

Finally, in the Admin PowerShell window, type Start-AIPScan
To check for the scanning status, type in the Admin PowerShell window Get-AIPScannerStatus
You can also check the scanner status in the AIP Blade in the Azure Portal by navigating to Nodes on the side pane.

After a few minutes you will begin seeing data start to flow into your Data discovery (Preview) dashboard in the azure portal. Since you are only doing discovery, you will not see any labeled or protected files (unless you have been using AIP before running the scanner), but you will see the identified files and the sensitive data types found in the configured repositories.

There is also a blade under Analytics named Recommendations (Preview) that will be populated by this data. Any sensitive information types discovered that do not have associated automatic classification conditions will display in this blade.

You may then click on the sensitive information type and a fly-out panel will allow you to assign the information type to a classification label. This allows you to quickly map your sensitive information to classification labels.

NOTE: The AIP scanner will only trigger on conditions which are set to Automatic.

Once you have configured these conditions, you can return to the profile in the Azure portal and change the settings to the ones below.

Schedule: Always
Info types to be discovered: Policy only
Enforce: On
Save

Because we set the schedule to Always, the scanner will begin monitoring the files automatically within 5 minutes. If you want to start the scan yourself, follow the instructions below.

In the AIP blade, under Scanner, click on Nodes.
Select your AIP Scanner server and click Start in the toolbar.

The result will be similar to the image shown below with labeled and protected files and the distribution graph showing in the Data discovery (Preview) dashboard.

Please let us know in the comments if you have any questions on this approach. For more information please be sure to check our team's github at aka.ms/mipfiles. If you are interested in how Microsoft uses the AIP scanner, please see the MSIT showcase at https://aka.ms/ScannerShowcase.

Thanks,

The Information Protection Customer Experience Engineering Team

Updated May 11, 2021

Version 16.0

information protection and governance

microsoft information protection

Kevin McKinnerney

Microsoft

Joined June 25, 2018

View Profile

Microsoft Security Blog

Follow this blog board to get notified when there's new activity

Daniel Harrison
Copper Contributor
Sep 01, 2023
This post from 2020 is somewhat out of date and misleading (as of August 2023) - management of the AIP Scanner was migrated to the Purview portal in 2022 (MC447310) and the analytics features described here have been deprecated.
As a general observation, there is a LOT of outdated information about the AIP Scanner floating about on various Microsoft pages, full of instructions and PS snippets that don't work with the latest version... it would be great if some of it could be updated, removed or even just marked with something to indicate that it's no longer valid.
If it helps anyone else, the information here -including the deployment steps - seems to be current enough to get things working (though some "Next Steps" links still point at 2020 content which is now invalid, so stay alert)
ByDesign1977
Brass Contributor
May 14, 2021
Hi

Can the detailed report created by the AIP scanner be configured to also contain the date a file was last accessed and the file size?

Many thanks
rizwansherif
Copper Contributor
Apr 17, 2021
vlastimils , Hi did you find any troubleshooting steps for your problem?
vlastimils
Copper Contributor
Nov 13, 2020
I've been troubleshooting my AIP scanner for weeks now and can't figure out why is it not detecting any info types when I set the 'Content scan job > Info types to be discovered > All' with 'Policy Enforcement > Enforce > Off'.
When I inspect the detailed csv report, it says: Error: 'Repository configuration is incorrect. No action to apply' on all files and then it skips all files on the next re-scan if set to automatic scan.

It does however discover info types fine when I set the 'Content scan job > Info types to be discovered > Policy only', where I have the info types configured on the label and the policy is published to myself and the scanner account.

The scanner is on the latest version 2.8.85.0 and Start-AIPScannerDiagnostics comes back all fine.

Has anyone experienced this at all? I do have a support ticket with MS, but no progress so far.

Thanks
amcgregor
Copper Contributor
Jul 04, 2020
How do we monitor the status of the AIP scanners? The UL client doesn't log any events
https://docs.microsoft.com/en-us/azure/information-protection/deploy-aip-scanner#event-log-ids-and-descriptions-for-the-scanner

Where is this recorded and can we generate a alert based on it?
Graham Hosking
Microsoft
Jun 13, 2020
This article is very clear in the steps required to install. Although it looks like the process is very involved, it can take very little time to initiate the first scan from VM setup - install SQL - configure policy. Is around 20 minutes end to end.

This is clearer than on docs.microsoft where it's trying to do both legacy and unified in one go. Well done Kevin.
quinla01
Copper Contributor
Jun 12, 2020
Looks like there have been GUI changes and several tabs renamed that don't align with the article any more. I can't find any documentation which is up to date as of today. Crucially, the concept of "profiles" no longer exist. I think they are now called "Content Scan Jobs" although some of what's described above does not exist in the "Content Scan Job" section. I have a couple of questions.

1) Why do you need to disable policy enforcement during scanner installation and then enable it post scanner install? Likewise can "schedule" and "Info types to be discovered" not be configured beforehand? If you need to install the scanner on another server using the same profile (or whatever it's now called) do you have to disable the settings again?
2) When you update a policy or profile, do you need to update the scanner (update-aipscanner)?

Thanks
Scott Barr
Brass Contributor
Feb 11, 2020
I thought I had read that the UL version of AIP Scanner only reported to the portal. We were not seeing new reports on the server itself in the /reports folder but can confirm they are there now. The local report does not indicate anything was discovered but I would have expected the repository we're testing with to be reported with no results in the portal. Unfortunately, nothing is reported in the portal. I'll go with inconsistent results for now.

The workspace is in NA - not an issue there.

Since this was an upgrade from classic I am inclined to remove the labels that were copied from the Azure portal and start fresh. The discovery was working as expected with classic but the scanner service consistently stopped after a scan, hence the upgrade.

Thanks,

-Scott
Kevin McKinnerney
Microsoft
Feb 09, 2020
Scott Barr,

The delay can vary depending on the location of your log analytics workspace versus your tenant. I know that the creation of a workspace defaults to Australia, so if you accidentally set it to that and your tenant is located in North America it could account for the delay. Keep in mind though that AIP analytics are still in preview so there may be inconsistent results. Per the local reports, you should still be able to find the reports at C:\Users\<scannerserviceuser>\appdata\local\Microsoft\MSIP\Scanner\Reports just as you did in the classic scanner version.

Thanks,

Kevin
Scott Barr
Brass Contributor
Feb 06, 2020
I've noticed a significant delay, as in hours, in the amount of time it takes for discovery data to appear in the portal. Is this normal? Also, is there no longer a local report that can be viewed like we had with the classic client? This makes testing much more difficult when trying to validate new settings, custom sensitive types, etc. Thanks!