April 2022 – SharePoint Syntex AI Optimizations

Published Apr 06 2022 02:46 PM 2,894 Views
Microsoft

SharePoint Syntex uses AI to organize & manage content, optimize search and compliance, and automate and improve your most critical business processes.

 

To improve the quality and consistency of text extraction from a file, Syntex now employs a more natural reading order – this also provides improved language support. We are optimizing the optical character recognition (OCR) service used by Syntex document understanding models. As a result of this enhancement, if a document understanding model was trained using PDF example files for OCR, the model should be tested to ensure it is accurately extracting data, as expected.

 

This enhancement helps Syntex extract multiline values inside tables or cells rather than reading the generated text top to bottom.  Consider this sample table:

A

B

C

1

2

3

Red

Blue

Green

D

E

F

4

5

6

Apple

Orange

Banana


Our previous OCR model “reads” the text stream as:

A 1 Red, B 2 Blue, C 3 Green…

 

This optimization will help the cloud read your tables more naturally:

ABC, 123, Red Blue Green…

 

You should check your models – and if you have text stored in a tabular layout in PDF files, you can take advantage of this update now.  Retraining models is simple:

  1. Go to the Content Center and click the Models link
  2. In the Models section, click on the name of the model to be reviewed/updated as needed
  3. In the Entity Extractor section of the page, click on one of the Entity Names
  4. Go to the Test tab
  5. Click on Add example files
  6. Upload a few new training files – these can be a duplicate of existing docs or net new files. Note – If uploading duplicate copies of the existing files, select the Keep both versions option so that the file will have a slightly different name.
  7. Review the Predictions column for extracted text to determine if the right content is still be extracted. Check each entity extractor.
  8. If there are discrepancies, then select Exit Training and go back through the model training process, starting with the Train Classifier step.
  9. Once the model updates are completed, go to the model’s main page and in the Where the model is applied section, select the Sync link to ensure the updates are published out to all document libraries where the model is applied.
  10. If needed, run the updated model on the local document library’s existing documents by selecting all documents that need to be re-run and clicking on Classify and extract.

We welcome your comments and feedback here on the Tech Community. Thank you.

5 Comments
%3CLINGO-SUB%20id%3D%22lingo-sub-3278271%22%20slang%3D%22en-US%22%3EApril%202022%20%E2%80%93%20SharePoint%20Syntex%20AI%20Optimizations%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-3278271%22%20slang%3D%22en-US%22%3E%3CP%3ESharePoint%20Syntex%20uses%20AI%20to%20organize%20%26amp%3B%20manage%20content%2C%20optimize%20search%20and%20compliance%2C%20and%20automate%20and%20improve%20your%20most%20critical%20business%20processes.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ETo%20improve%20the%20quality%20and%20consistency%20of%20text%20extraction%20from%20a%20file%2C%20Syntex%20now%20employs%20a%20more%20natural%20reading%20order%20%E2%80%93%20this%20also%20provides%20improved%20language%20support.%20We%20are%20optimizing%20the%20optical%20character%20recognition%20(OCR)%20service%20used%20by%20Syntex%20document%20understanding%20models.%20As%20a%20result%20of%20this%20enhancement%2C%20if%20a%20document%20understanding%20model%20was%20trained%20using%20PDF%20example%20files%20for%20OCR%2C%20the%20model%20should%20be%20tested%20to%20ensure%20it%20is%20accurately%20extracting%20data%2C%20as%20expected.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EThis%20enhancement%20helps%20Syntex%20extract%20multiline%20values%20inside%20tables%20or%20cells%20rather%20than%20reading%20the%20generated%20text%20top%20to%20bottom.%20%26nbsp%3BConsider%20this%20sample%20table%3A%3C%2FP%3E%0A%3CTABLE%3E%0A%3CTBODY%3E%0A%3CTR%3E%0A%3CTD%20width%3D%22208%22%3E%3CP%3EA%3C%2FP%3E%0A%3CP%3EB%3C%2FP%3E%0A%3CP%3EC%3C%2FP%3E%0A%3C%2FTD%3E%0A%3CTD%20width%3D%22208%22%3E%3CP%3E1%3C%2FP%3E%0A%3CP%3E2%3C%2FP%3E%0A%3CP%3E3%3C%2FP%3E%0A%3C%2FTD%3E%0A%3CTD%20width%3D%22208%22%3E%3CP%3ERed%3C%2FP%3E%0A%3CP%3EBlue%3C%2FP%3E%0A%3CP%3EGreen%3C%2FP%3E%0A%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%20width%3D%22208%22%3E%3CP%3ED%3C%2FP%3E%0A%3CP%3EE%3C%2FP%3E%0A%3CP%3EF%3C%2FP%3E%0A%3C%2FTD%3E%0A%3CTD%20width%3D%22208%22%3E%3CP%3E4%3C%2FP%3E%0A%3CP%3E5%3C%2FP%3E%0A%3CP%3E6%3C%2FP%3E%0A%3C%2FTD%3E%0A%3CTD%20width%3D%22208%22%3E%3CP%3EApple%3C%2FP%3E%0A%3CP%3EOrange%3C%2FP%3E%0A%3CP%3EBanana%3C%2FP%3E%0A%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3C%2FTBODY%3E%0A%3C%2FTABLE%3E%0A%3CP%3E%3CBR%20%2F%3EOur%20previous%20OCR%20model%20%E2%80%9Creads%E2%80%9D%20the%20text%20stream%20as%3A%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3EA%201%20Red%2C%20B%202%20Blue%2C%20C%203%20Green%E2%80%A6%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EThis%20optimization%20will%20help%20the%20cloud%20read%20your%20tables%20more%20naturally%3A%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3EABC%2C%20123%2C%20Red%20Blue%20Green%E2%80%A6%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EYou%20should%20check%20your%20models%20%E2%80%93%20and%20if%20you%20have%20text%20stored%20in%20a%20tabular%20layout%20in%20PDF%20files%2C%20you%20can%20take%20advantage%20of%20this%20update%20now.%26nbsp%3B%20Retraining%20models%20is%20simple%3A%3C%2FP%3E%0A%3COL%3E%0A%3CLI%3EGo%20to%20the%20Content%20Center%20and%20click%20the%20%3CSTRONG%3EModels%3C%2FSTRONG%3E%20link%3C%2FLI%3E%0A%3CLI%3EIn%20the%20%3CSTRONG%3EModels%20section%2C%20%3C%2FSTRONG%3Eclick%20on%20the%20name%20of%20the%20model%20to%20be%20reviewed%2Fupdated%20as%20needed%3C%2FLI%3E%0A%3CLI%3EIn%20the%20%3CSTRONG%3EEntity%20Extractor%3C%2FSTRONG%3E%20section%20of%20the%20page%2C%20click%20on%20one%20of%20the%20%3CSTRONG%3EEntity%20Names%3C%2FSTRONG%3E%3C%2FLI%3E%0A%3CLI%3EGo%20to%20the%20%3CSTRONG%3ETest%3C%2FSTRONG%3E%20tab%3C%2FLI%3E%0A%3CLI%3EClick%20on%20%3CSTRONG%3EAdd%20example%20files%20%3C%2FSTRONG%3E%3C%2FLI%3E%0A%3CLI%3EUpload%20a%20few%20new%20training%20files%20%E2%80%93%20these%20can%20be%20a%20duplicate%20of%20existing%20docs%20or%20net%20new%20files.%20Note%20%E2%80%93%20If%20uploading%20duplicate%20copies%20of%20the%20existing%20files%2C%20select%20the%20%3CSTRONG%3EKeep%20both%20versions%3C%2FSTRONG%3E%20option%20so%20that%20the%20file%20will%20have%20a%20slightly%20different%20name.%3C%2FLI%3E%0A%3CLI%3EReview%20the%20%3CSTRONG%3EPredictions%3C%2FSTRONG%3E%20column%20for%20extracted%20text%20to%20determine%20if%20the%20right%20content%20is%20still%20be%20extracted.%20Check%20each%20entity%20extractor.%3C%2FLI%3E%0A%3CLI%3EIf%20there%20are%20discrepancies%2C%20then%20select%20%3CSTRONG%3EExit%20Training%3C%2FSTRONG%3E%20and%20go%20back%20through%20the%20model%20training%20process%2C%20starting%20with%20the%20%3CSTRONG%3ETrain%20Classifier%3C%2FSTRONG%3E%20step.%3C%2FLI%3E%0A%3CLI%3EOnce%20the%20model%20updates%20are%20completed%2C%20go%20to%20the%20model%E2%80%99s%20main%20page%20and%20in%20the%20%3CSTRONG%3EWhere%20the%20model%20is%20applied%20%3C%2FSTRONG%3Esection%2C%20select%20the%20%3CSTRONG%3ESync%3C%2FSTRONG%3E%20link%20to%20ensure%20the%20updates%20are%20published%20out%20to%20all%20document%20libraries%20where%20the%20model%20is%20applied.%3C%2FLI%3E%0A%3CLI%3EIf%20needed%2C%20run%20the%20updated%20model%20on%20the%20local%20document%20library%E2%80%99s%20existing%20documents%20by%20selecting%20all%20documents%20that%20need%20to%20be%20re-run%20and%20clicking%20on%20%3CSTRONG%3EClassify%20and%20extract.%3C%2FSTRONG%3E%3C%2FLI%3E%0A%3C%2FOL%3E%0A%3CP%3EWe%20welcome%20your%20comments%20and%20feedback%20here%20on%20the%20Tech%20Community.%20Thank%20you.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-TEASER%20id%3D%22lingo-teaser-3278271%22%20slang%3D%22en-US%22%3E%3CP%3EWe%E2%80%99ve%20rolled%20out%20a%20new%20optimization%20to%20our%20OCR%20engine%20for%20Syntex%20document%20understanding%20models%20%E2%80%93%20learn%20how%20it%20works%20and%20if%20you%20can%20benefit.%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22ChrisMcNulty_0-1649281475426.jpeg%22%20style%3D%22width%3A%20999px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F361927i97C33DC40355FB49%2Fimage-size%2Flarge%3Fv%3Dv2%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22ChrisMcNulty_0-1649281475426.jpeg%22%20alt%3D%22ChrisMcNulty_0-1649281475426.jpeg%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-TEASER%3E%3CLINGO-LABS%20id%3D%22lingo-labs-3278271%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3ESyntex%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-3370027%22%20slang%3D%22en-US%22%3ERe%3A%20April%202022%20%E2%80%93%20SharePoint%20Syntex%20AI%20Optimizations%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-3370027%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%20is%20there%20a%20way%20to%20use%20the%20old%20method%20for%20some%20models%3F%26nbsp%3B%20We%20had%20some%20models%20working%20well%20but%20now%20producing%20unexpected%2Fwrong%20results%20due%20to%20the%20change%20in%20reading%20tables.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-3370054%22%20slang%3D%22en-US%22%3ERe%3A%20April%202022%20%E2%80%93%20SharePoint%20Syntex%20AI%20Optimizations%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-3370054%22%20slang%3D%22en-US%22%3E%3CP%3EI've%20been%20using%20SharePoint%20Syntex%20since%20pre-release%20with%20some%20good%20results.%3C%2FP%3E%3CP%3ELast%20week%20we%20had%20a%20batch%20of%20new%20documents%20and%20we%20noticed%20many%20unexpected%20results.%3C%2FP%3E%3CP%3EWe%20don%E2%80%99t%20control%20the%20PDF%20format%20that%20we%20receive%20but%20we%20were%20getting%20results%20that%20parsed%20as%20the%20below%2C%20I%20have%20been%20using%20the%20%22After%20Text%22%20rule%20to%20identify%20the%20numbers%20to%20extract%2C%20example%3A%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22peb71b_0-1652443673400.png%22%20style%3D%22width%3A%20400px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F371410i8F9F90833FB45DF4%2Fimage-size%2Fmedium%3Fv%3Dv2%26amp%3Bpx%3D400%22%20role%3D%22button%22%20title%3D%22peb71b_0-1652443673400.png%22%20alt%3D%22peb71b_0-1652443673400.png%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3CP%3EUnfortunately%20the%20PDF%20table%20is%20now%20parsed%20differently%20(below%20example%20of%20same%20form)%3A%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22peb71b_1-1652443673404.png%22%20style%3D%22width%3A%20400px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F371411i95627082DEEB7EC7%2Fimage-size%2Fmedium%3Fv%3Dv2%26amp%3Bpx%3D400%22%20role%3D%22button%22%20title%3D%22peb71b_1-1652443673404.png%22%20alt%3D%22peb71b_1-1652443673404.png%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3CP%3ECan%20we%20use%20the%20previous%20method%20of%20Syntex%20parsing%20PDFs%2C%20as%20the%20new%20method%20does%20not%20work%20in%20many%20of%20our%20scenarios.%26nbsp%3B%26nbsp%3B%3C%2FP%3E%3CP%3EThanks%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-3372651%22%20slang%3D%22en-US%22%3ERe%3A%20April%202022%20%E2%80%93%20SharePoint%20Syntex%20AI%20Optimizations%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-3372651%22%20slang%3D%22en-US%22%3E%3CP%3EYou%20should%20take%20a%20look%20at%20azure%20cognitive%20text%20extraction%20for%20OCR%20to%20see%20what%20it%20extracts%20via%20an%20API%20call.%20That%20is%20one%20thing%20I%20have%20done%20to%20determine%20if%20the%20issue%20is%20Syntex%20or%20the%20underlying%20Azure%20Cognitive.%20If%20it%20is%20cognitive%2C%20not%20sure%20what%20solutions%20the%20product%20group%20may%20have%20other%20than%20to%20use%20their%20influence%20on%20the%20underlying%20technical%20team.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-3373309%22%20slang%3D%22en-US%22%3ERe%3A%20April%202022%20%E2%80%93%20SharePoint%20Syntex%20AI%20Optimizations%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-3373309%22%20slang%3D%22en-US%22%3E%3CP%3EThanks%20%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F86286%22%20target%3D%22_blank%22%3E%40Mario%20Fulan%3C%2FA%3E%26nbsp%3B%2C%20I'll%20give%20that%20a%20go%20to%20confirm.%26nbsp%3B%20It%20was%20puzzling%20why%20they%20just%20stopped%20working%20but%20this%20post%20does%20explain%20it%20was%20a%20%22feature%22%20change%2C%20which%20I'm%20sure%20helps%20some%20but%20not%20my%20case.%26nbsp%3B%20As%20a%20quick%20alternative%2C%20I%20switched%20to%20Power%20Automate%20Desktop%20which%20does%20the%20job%20by%20extracting%20tables%20(which%20come%20out%20as%20I%20expected).%26nbsp%3B%20Shame%20as%20I%20like%20Syntex.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-3373356%22%20slang%3D%22en-US%22%3ERe%3A%20April%202022%20%E2%80%93%20SharePoint%20Syntex%20AI%20Optimizations%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-3373356%22%20slang%3D%22en-US%22%3E%3CP%3EAzure%20Cognitive%20does%20a%20text%20extraction.%20It%20works%20pretty%20well%20usually%20for%20text-backed%20PDF%20forms%20but%20not%20so%20much%20it%20the%20PDF%20is%20simply%20a%20tiff%20image.%20Not%20sure%20which%20type%20you%20are%20using%2C%20but%20it%20is%20sometimes%20a%20puzzle%20when%20things%20just%20change%20but%20when%20it%20does%20it%20generally%20is%20because%20someone%20else%20complained%20and%20they%20fixed%20it%20for%20their%20use%20case%20%3A)%3C%2Fimg%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E
Co-Authors
Version history
Last update:
‎Apr 06 2022 02:46 PM
Updated by: