trainable classifier
5 TopicsIssue Using Built in Trainable Classifiers in Auto Labelling Policies - Purview
Over the last few days, I have run into issue while configuring Auto labelling policies in Purview specifically when using built in classifiers for eg: Budget, Agreements These classifiers are parr of ready to use. They have been working well for us until recently but now saving an auto labelling rule that includes any of Trainable classifiers getting client side error: 'Could not find rule pack associated with sensitive information type' this is unexpected because: same classifiers eg: Budget worked perfectly just few weeks ago. No changes have made to roll, permissions on our side. Still not sure why showing issue now. Kindly request you, help me with root cause of the cause. Please feel free to post it comments if someone faced same issue in using trainable classifiers in auto labelling policies. Thanks in advance. Regards, BanuMurali79Views1like2CommentsTrainable Classifiers - Tips
Hello All, Just sharing some tips to assist with the process of data collection and the creation of trainable classifiers for the purpose of labelling/Data Loss prevention. -Regarding training Machine Learning to recognize a certain document type, It must have one or more recognizable aspects. Possible usable recognizable aspects of the data/document type: Keyword or metadata values (keyword query language) Previously identified patterns of sensitive information like social security, credit card, or bank account numbers https://learn.microsoft.com/en-us/purview/sit-sensitive-information-type-entity-definitions https://learn.microsoft.com/en-us/purview/sit-document-fingerprinting: recognizing an item because it's a variation on a template The presence of exact strings https://learn.microsoft.com/en-us/purview/sit-learn-about-exact-data-match-based-sits#learn-about-exact-data-match-based-sensitive-information-types -In the below examples, we focus on Document Fingerprinting and Previously identifiable Sensitive information Type. For e.g. Regarding positive samples, The below file samples display a pattern, CC info (dummy data), Include Keywords referring to CC info such CVV2/AMEX etc.... as well as SSN information. -This can be regarded as a pattern for positive detection. The above data samples (about 150 samples of a similar pattern) are stored in a folder in a dedicated SharePoint Site(In the below screenshot, Same items are used as false samples for another classifier). -Regarding Negative samples, It is the same concept, It can be also stored in a folder in a dedicated Sharepoint Site and have a unique pattern or fingerprint. for e.g. -The below samples represent Credential information (dummy), Need to be about 150 samples or so. The samples should strongly represent a uniform document/data type different from positive samples. Similarly the data is stored in a dedicated folder in a SharePoint Site: Once the trainable classifier is created and fed this information, It will successfully identify data type to facilitate detection and minimize potential false positive.1.6KViews0likes2CommentsTrainable Classifiers
Hello, I am testing out trainable classifiers and need to know how to investigate training failures in detail? Some errors are vague such as "Failed due to training error" or "Invalidlocationserror" and clicking on review test results does not show details. Where is log stored? How can we review errors further? Any help is appreciated.1.1KViews0likes3CommentsTrainable classifier/confusion
Hi all, I would need help on better understanding of trainable classifiers. https://learn.microsoft.com/en-us/microsoft-365/compliance/classifier-learn-about?view=o365-worldwide So, in MS article it says: "this method of categorization is more about using a classifier to identify an item based on what the item is, not by elements that are in the item;" So, if I have a word-doc file (or item in this case), & name of that item/file is "Bank Statement", PDF file/item "Drivers license" & excel file/item named "Shipping orders", what does exactly trainable classifiers is looking for (or how will he know how to distinguish them)? What is trainable classifier actually looking for in item & how will he know how to categorize it (meaning ? I always thought its by content, but as in sentence above, trainable classifier is not scanning the content of file. So what is "X" factor that trainable classifier is searching for/looking for during his scan & how will he know how he needs to separate those files (that those are 3 different files). KR1.1KViews0likes0Comments