Microsoft Syntex AMA

83 Comments

mpjjonker
Brass Contributor
Nov 03, 2022
LinkedEntities: in some documents we can annotate parties (persons, organizations), each party has its own role in the document, seller, buyer, witness, employer, employee, etc... Another example would be, a person with an address and other properties (date of birth, hobby, jobtitle) : it would be nice if we could group\collect those in one annotation with features\properties
- JamesEccles
  Microsoft
  Nov 15, 2022
  Not something we have in Syntex today, but this is great feedback for us to look at. Thanks!
mpjjonker
Brass Contributor
Nov 03, 2022
Pre-annotation possibilties: For domain experts (who we need to label our examples) it is often easier to start with pre-annotated documents. That way they have something to (dis)agree with. One way we have been using in the past, is a simple keyword (list) match to perform machine annotation, followed by the manual activity of domain experts. And about these domainexperts: sometimes it depends which person annotates the content, in other environments there is the detection of 'disagreement' between annotators. Is that still needed today ?
- IanStory
  Microsoft
  Nov 15, 2022
  Hi Michel! One of the great benefits of Syntex is we provide a set of prebuilt models in addition to allow you to create your own models. I would say in the case of prebuilt models, this isn't needed today, you just take the model (like for invoices or receipts - more coming soon) and apply it and let it do its magic. In the case of building your own models using unstructured document processing, structured document processing, or freeform document processing, it is absolutely helpful to have examples to (dis)agree with, and so I think that's still useful as a concept. However, you don't have to "pre-annotate" them, more just have a small training set ready that whoever is building the model is familiar with (and I get that perhaps you'd have a separate set of documents that a domain expert had, perhaps even graphically annotated, to help you if you're building the model but not the expert yourself). One of the grand things we're trying to do with Syntex though is make it so that the domain expert can build the model themselves, and not need to split this into two separate roles!
mpjjonker
Brass Contributor
Nov 03, 2022
Question about language support. The support page mentions: This model supports all of the Latin-based languages How can we leverage the language understanding of our native language? For example: tokenization (of composite words), negation, SVO vs SOV, Part of Speech. Is the https://turing.microsoft.com/ project being used 'under the hood'?
- JamesEccles
  Microsoft
  Nov 15, 2022
  The Unstructured model type supports any latin character set language. This model type is not using Turing under the hood- it ultimately has no semantic understanding of content. Rather it understands the text in a document as a series of tokens. Its patterns within/between/around these tokens that the model is building on. https://learn.microsoft.com/en-gb/microsoft-365/contentunderstanding/explanation-types-overview#what-are-tokens
  - mpjjonker
    Brass Contributor
    Nov 15, 2022
    In an earlier stage of NLP we have used UIMA , where tokenization and sentencazation where important part of the understanding, should I compare this with that?
DebraCurrieAustralia
Copper Contributor
Nov 02, 2022
Whilst I understand pricing rates will vary by country/region, I would like to understand the pricing mechanisms - is Syntex priced per user and if so, what constitutes a user - is it the person creating the 'sets', running the models, uploading a document to a library that will be "syntexed" 🙂 or something else? Also, what permission levels do I need to implement Syntex our IT is pretty locked down and I'd like to do some POCs before building a case. Thank you
- Chris McNulty
  Silver Contributor
  Nov 15, 2022
  Most of Microsoft Syntex will require no upfront seat licensing and will be available to almost all M365 commercial plans. Once activated, you will be able classify and extract content, content assembly, eSignature, OCR, image tag, translate, summarize etc. priced per page or per doc, generally, on a pay as you go basis. Unstructured doc processing and prebuilt models will be launched this month as metered services in preview with an initial cost of free (pricing to be disclosed for GA).
- JamesEccles
  Microsoft
  Nov 15, 2022
  This article explains the way the licensing works, including what the specific actions are that would constitute a user needing a license - https://learn.microsoft.com/en-gb/microsoft-365/contentunderstanding/syntex-licensing We also announced at Ignite, that we will be moving many services in Syntex from a user based license, to a Pay-As-You-Go model over the coming months. This means you wouldn't need a user license, and instead would pay for what you consume. To do some testing, the minimum you would need is for your IT to create a Content Center site and give you permissions to it- https://learn.microsoft.com/en-gb/microsoft-365/contentunderstanding/create-a-content-center Better yet, have them start a free trial so you can get hands on with more of the capabilities - https://learn.microsoft.com/en-gb/microsoft-365/contentunderstanding/trial-syntex
Mario_Fulan
Iron Contributor
Nov 02, 2022
Can you explain a bit more about the differences in functionality for Freeform documents (using AI Builder) and Unstructured Documents (using doc understanding models)? I know the training is different, freeform doesn't do classification, and a few other things. One question I have is whether both can do the "deskew" of PDF or scanned images before processing. Freeform seems to handle rotated documents but unstructured documents have trouble with OCR text extraction positioning if the image is rotated
- JamesEccles
  Microsoft
  Nov 15, 2022
  This article gives a good overview of the different model types - https://learn.microsoft.com/en-gb/microsoft-365/contentunderstanding/difference-between-document-understanding-and-form-processing-model Both Unstructured and Freeform models can be used to tackle similar use cases. Limitations on file format and language may push you in one direction or other. But assuming both are possibilities for your use case, then I would start with Freeform. It has a lower bar for effort during training. If that doesn't get at the right data, then shift to Unstructured which has more of a teaching element and more need for human training. Both model types use the same OCR engine, so should be broadly the same for skewed docs. One thing to note though is that Freeform does factor in layout to the model, where Unstructured has to restructure into linear text.
Mario_Fulan
Iron Contributor
Nov 02, 2022
Question: If I have content composition, is there a way to have the finished document land in a different folder or document library than the template? I can see using content routing, but that feature is not yet available
- Wayne_Addison
  Copper Contributor
  Nov 04, 2022
  Hi Mario, I trigger the content creation via a 'For a selected item' Power Auto trigger then a 'Generate document using SharePoint Syntex (preview)' action, it still places the composed doc in the template folder but I use downstream actions to move the file and convert to PDF. * I'm assuming 'content composition' is the same as 'content assembly'.
  - Mario_Fulan
    Iron Contributor
    Nov 04, 2022
    This is the action I was referring to. I know I can use a downstream action after the document has generated. Was wondering about the addition of the redirect target location so that the doc wasn't generated (even briefly) in the source location. There is a concern about permissions and directing to the target with permissions intact. Thanks for the fast reply. I'll continue to use this approach for now.
Mario_Fulan
Iron Contributor
Nov 02, 2022
Question: For content composition with Syntex, are there APIs that can be called to initiate a composition using a template and if not do we have an ETA for those?
- JamesEccles
  Microsoft
  Nov 15, 2022
  We currently have a Power Automate action in the SharePoint connector for automated content assembly. It's called "Generate documents using Syntex (preview). We're working on having APIs give the same experience programmatically, likely to be released next year.
4BobRandall
Brass Contributor
Nov 02, 2022
Is this a service that will be made available to those of us in the Government Community Cloud as well as the commercial cloud?
- JamesEccles
  Microsoft
  Nov 15, 2022
  Syntex is available today in GCC. Right now Syntex is not available in GCCH or DoD. There's no timeline we can share on these clouds right now.
Wayne_Addison
Copper Contributor
Oct 30, 2022
When classifying documents, is there a way to tell if a classification attempt has been made but none of the models (deployed to the library) succeeded in classifying some of the docs? Something like setting the classification attempt time/date and a content type of 'Unknown' would be good. Thanks.
- JamesEccles
  Microsoft
  Nov 15, 2022
  Great feedback. We're looking something along these lines to give you better indications of where Syntex has processed a document. Look out for a change in this direction coming soon.

Event banner

Event details

83 Comments

Date and Time