document intelligence
3 TopicsDoc Intelligence: Custom Extraction model | Confidence score deterioration with new formats/layouts
Hi everyone, This is my first time using custom extraction models on the Document Intelligence service, and I would appreciate your input on an experiment I am conducting. I wanted to investigate how these models' confidence scores behave when documents with significantly different format/layout are introduced (later) in the training phase. I started by training models with documents in the same format (some of worse picture quality and slightly rotated), increasingly adding more samples (a new model was trained every time I added new documents, at increments of 5). After every new model was trained, I checked scores against the same, unseen by the model holdout set that had the same format with those in the training set. After training the final model, with 35 identically formatted documents, I started introducing documents with a significantly different format/layout and retraining (at increments of 10). Confidence scores against the holdout set (unchanged) dropped after doing so, without recovering to previous levels. See graph below showing how confidence scores evolved after every training step (adding new documents at every step). Any insights as to why this has happened?169Views1like2CommentsExtracting data from unstructured forms using Azure AI Document Intelligence.
In our latest blog post, we delve into a scenario where our B2B product helps businesses extract data from messy PDFs, emails, and websites. Say goodbye to manual extraction—Azure AI Document Intelligence does the heavy lifting. Let’s explore how it works In our latest blog post, we delve into a scenario where our B2B product helps businesses extract data from messy PDFs, emails, and websites. Say goodbye to manual extraction—Azure AI Document Intelligence does the heavy lifting. Let’s explore how it works5.9KViews1like0Comments