Event banner
Microsoft Syntex AMA
Event details
1. What is the maximum no of Microsoft Syntex models that can be applied on single document library (Unstructured Document processing)? I have tried with 3, it did work well. But in case I need to have ~100 models created for my legal department, pls validate it will make more sense to have divide these models in 30 logical buckets, each bucket having 3 models and applied on document libraries.
2. What's the way using which we can identify the list of documents which Syntex was not able to process/classify?
3. I assume if a unstructured document model is already associated with 30+ libraries, making change in model will automatically reflected back in all libraries.
4. If the customer is having DMS where they are uploading content in French and German languages as well (though English is primary language for their DMS); whether the models that we have trained using English training documents will work on these other languages content too (I assume not)? Or is it required to train the model using training files of other languages (which I assume it should be)? Whether training a single model using different languages training files can create any issue (So in the model we will be using English, French and German languages sample files to train)?
- Mario_FulanNov 15, 2022Iron ContributorTo add to this. I believe currently FreeForm or Structured models (using AI Builder) are limited to only 1 per doc lib while you can have many (up to a limit?) for unstructured. Is that correct? Are there plans to allow multiple AI Builder based models in the future to a single lib?
- JamesEcclesNov 15, 2022
Microsoft
That's correct Mario- multiple unstructured, but only a single freeform or structured model per library. This is because only Unstructured models include a classifier component. We definitely want to have a classification story that goes across all model types. Watch this space!
- JamesEcclesNov 15, 2022
Microsoft
1. There is no hard limit, but we tend to recommend up to 10 unstructured models per library. This is because a file is evaluated against all of the models during classification, so having a large number will have a performance impact. If there is a logical categorisation you can make to split this into multiple libraries (perhaps 10), then this would be a good balance. 2. We're working on a method for you to see this. Right now, you can't distinguish between a file that can't be classified and one that is still waiting to be processed. Anything that Syntex has not classified in a library will be set to the default content type (such as 'Document') 3. This depends on the nature of the change. If you are creating new extractors, then you will have to re-sync from the content center to push out the updated columns. But if your change is just focused on training quality, then they will be pushed out automatically. 4. There is no one size fits all answer to this 🙂. You could choose to have models that look at each language separately (which might be easier to train and manage), or have a single model that has samples and training for all languages (that gives a simpler output, and a single content type). I would suggest you need to decide based on how complex the entities you want to extract are.