trainable classifier
4 TopicsTrainable Classifiers - Tips
Hello All, Just sharing some tips to assist with the process of data collection and the creation of trainable classifiers for the purpose of labelling/Data Loss prevention. -Regarding training Machine Learning to recognize a certain document type, It must have one or more recognizable aspects. Possible usable recognizable aspects of the data/document type: Keyword or metadata values (keyword query language) Previously identified patterns of sensitive information like social security, credit card, or bank account numbers https://learn.microsoft.com/en-us/purview/sit-sensitive-information-type-entity-definitions https://learn.microsoft.com/en-us/purview/sit-document-fingerprinting: recognizing an item because it's a variation on a template The presence of exact strings https://learn.microsoft.com/en-us/purview/sit-learn-about-exact-data-match-based-sits#learn-about-exact-data-match-based-sensitive-information-types -In the below examples, we focus on Document Fingerprinting and Previously identifiable Sensitive information Type. For e.g. Regarding positive samples, The below file samples display a pattern, CC info (dummy data), Include Keywords referring to CC info such CVV2/AMEX etc.... as well as SSN information. -This can be regarded as a pattern for positive detection. The above data samples (about 150 samples of a similar pattern) are stored in a folder in a dedicated SharePoint Site(In the below screenshot, Same items are used as false samples for another classifier). -Regarding Negative samples, It is the same concept, It can be also stored in a folder in a dedicated Sharepoint Site and have a unique pattern or fingerprint. for e.g. -The below samples represent Credential information (dummy), Need to be about 150 samples or so. The samples should strongly represent a uniform document/data type different from positive samples. Similarly the data is stored in a dedicated folder in a SharePoint Site: Once the trainable classifier is created and fed this information, It will successfully identify data type to facilitate detection and minimize potential false positive.1.5KViews0likes2CommentsTrainable classifier/confusion
Hi all, I would need help on better understanding of trainable classifiers. https://learn.microsoft.com/en-us/microsoft-365/compliance/classifier-learn-about?view=o365-worldwide So, in MS article it says: "this method of categorization is more about using a classifier to identify an item based on what the item is, not by elements that are in the item;" So, if I have a word-doc file (or item in this case), & name of that item/file is "Bank Statement", PDF file/item "Drivers license" & excel file/item named "Shipping orders", what does exactly trainable classifiers is looking for (or how will he know how to distinguish them)? What is trainable classifier actually looking for in item & how will he know how to categorize it (meaning ? I always thought its by content, but as in sentence above, trainable classifier is not scanning the content of file. So what is "X" factor that trainable classifier is searching for/looking for during his scan & how will he know how he needs to separate those files (that those are 3 different files). KR1.1KViews0likes0CommentsTrainable Classifiers
Hello, I am testing out trainable classifiers and need to know how to investigate training failures in detail? Some errors are vague such as "Failed due to training error" or "Invalidlocationserror" and clicking on review test results does not show details. Where is log stored? How can we review errors further? Any help is appreciated.975Views0likes3Comments