SharePoint Syntex uses AI to organize & manage content, optimize search and compliance, and automate and improve your most critical business processes.
To improve the quality and consistency of text extraction from a file, Syntex now employs a more natural reading order – this also provides improved language support. We are optimizing the optical character recognition (OCR) service used by Syntex document understanding models. As a result of this enhancement, if a document understanding model was trained using PDF example files for OCR, the model should be tested to ensure it is accurately extracting data, as expected.
This enhancement helps Syntex extract multiline values inside tables or cells rather than reading the generated text top to bottom. Consider this sample table:
A B C |
1 2 3 |
Red Blue Green |
D E F |
4 5 6 |
Apple Orange Banana |
Our previous OCR model “reads” the text stream as:
A 1 Red, B 2 Blue, C 3 Green…
This optimization will help the cloud read your tables more naturally:
ABC, 123, Red Blue Green…
You should check your models – and if you have text stored in a tabular layout in PDF files, you can take advantage of this update now. Retraining models is simple:
We welcome your comments and feedback here on the Tech Community. Thank you.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.