Blog Post

Security, Compliance, and Identity Blog
3 MIN READ

Announcing GA of machine learning based trainable classifiers for your compliance needs

sanjay_kidambi's avatar
Jan 12, 2021

Remote work has accelerated the growth in volume and variety of data in organizations. Microsoft provides you intelligent solutions to meet the resulting challenges in governing this data.

 

Foundational to these solutions are capabilities to detect and classify data accurately. Only after you accurately classify your data can you start to govern your data by deciding what to protect, what to retain or delete. At Microsoft, we know that the future of accurate data classification at scale requires machine learning. Through our trainable classifiers, you can leverage the power of machine learning to identify more categories of data with increased accuracy. These classifiers use natural language processing and statistical algorithms to identify critical information.

 

You can deploy machine learning models with ease through our built-in classifiers, which have been trained at Microsoft and are ready to use in the Microsoft 365 compliance center. Built-in classifiers are readily available for your use to detect and classify popular data categories, for example resumes and source code. Your organization will also have unique data that you can classify by creating a custom trainable classifier. Our customers across a range of industries are taking advantage of this unique opportunity to easily build and deploy trainable classifiers without needing any expertise in machine learning. For instance, a custom classifier can be built to classify loan contracts, invoices, and project documents. Together, both built-in and build-your-own trainable classifiers provide classification support for a breadth of categories important to your enterprise.

 

Today we are excited to announce the general availability of machine learning based trainable classifiers. This GA includes two new features to improve the accuracy of trainable classifiers. Built-in classifiers are available now in English, with support for Spanish, Japanese, French, German, Portuguese, Italian, and Chinese (simplified) coming in the second half of 2021.

 

Figure: List of ‘built-in’ and ‘build-your-own’ trainable classifiers in the Microsoft 365 compliance center

 

                  Figure: Example of a custom classifier built for detecting contract documents

 

Use trainable classifiers to automatically apply retention policies.

Many organizations rely on employee judgment and manual classification when it comes to managing records and retention schedules. This method is prone to errors and inaccuracies. Additionally, most organizations have unmanaged data repositories that need governance but don’t have a way to classify data at scale.

 

With trainable classifiers, you can apply retention schedules and records policies at scale for business-critical information. For example, a compliance administrator and a records manager can work together to train a new classifier to recognize procurement documents and auto-apply a retention policy.

 

Microsoft’s legal team is one of many customers who use trainable classifiers to manage records in place and at scale.

 

“My hands-on experience in creating a trainable classifier demonstrated how automatic detection and classification of critical records help in accurately executing in-place records management across a large enterprise.”

-Jorge Garcia, Business Operations Associate, Corporate, External, and Legal Affairs at Microsoft

 

In Content Explorer, which is your primary tool for classified and labeled data discovery, you will see documents and emails that are a match for trainable classifiers. We are now offering you two new features to improve the accuracy of both built-in and build-your-own trainable classifiers. You can now evaluate the matched documents and provide feedback that will retrain the classifier and improve its accuracy. You can also view analytics on the degree of accuracy improvement to decide when to republish your classifier.

 

                                                     Figure: Ability to view matched documents for built-in ‘resume’ classifier and provide feedback to improve classifier accuracy

 

                  Figure: Analytics around accuracy improvement for the ‘resume’ classifier post-feedback

 

Getting started

Machine learning based trainable classifiers are a powerful capability that enable you to detect and classify data unique to your organization at enterprise scale. We will continue to innovate and bring you new value here. Using trainable classifiers to automatically apply data protection policies in Microsoft 365 applications like Word, Excel, PowerPoint will be generally available in the first half of 2021.

 

Take advantage of our machine learning platform to start building your own trainable classifier. Learn more about how to create trainable classifiers, how to improve their accuracy, and how to use them to automatically apply retention schedules and records policies. You will need one of the following SKUs to use trainable classifiers--Microsoft 365 E5 or E5 Compliance or E5 Information Protection and Governance.

 

We are excited to roll out these capabilities and help you in your compliance journey!

 

-Cathy Lin, Program Manager

Updated May 11, 2021
Version 7.0
  • this is powerful. Congratulations team! Thinking if this can be extended for governing user access across the enterprise.

  • 40345839668's avatar
    40345839668
    Brass Contributor

    Başlarken

    Makine öğrenimi tabanlı eğitilebilir sınıflandırıcılar, kuruluş ölçeğinizde kuruluşunuza özgü verileri algılamanızı ve sınıflandırmanızı sağlayan güçlü bir özelliktir. Biz yenilik ve burada yeni bir değer getirmek devam edecektir. Word, Excel gibi Microsoft 365 uygulamalarında veri koruma ilkelerini otomatik olarak uygulamak için eğitilebilir sınıflandırıcılar kullanarak PowerPoint

     

  • nipW's avatar
    nipW
    Brass Contributor

    sanjay_kidambi Already using on my tenancy and demonstrated this to a couple of clients and they loved it. Do you know when we expect to have trainable classify on img, pdf, etc?

    Also, I m using SharePoint Syntex which allows you to label (retention) pdf, img as well, but the only limitation is still you cannot apply sensitivity labels. 

    Really like to see both features merged together and allowed to apply retention, DLP, and AIP on non-MS product like pfp, img. 

    Thanks.