Forum Discussion
AustinRP08
Dec 02, 2021Copper Contributor
Non-greedy extractors
Hi everyone, Do SharePoint Syntex extractors for document understanding support non-greedy regex expressions? For example, I'm trying to pull a contract term but throughout the file it has m...
Mario_Fulan
Dec 02, 2021Iron Contributor
Unless it is supported directly in the RegEx expression the answer I think is no. I've done some work with clients to use different classification (e.g., StyleAContract, StyleBContract) so my models can be a bit smarter. You can still reuse the evaluations, but then use better proximity or location in file to assist the AI in the model. If everything is just "Contract" then it is up to you to make the RegEx find only one instance.
I like the idea in general as a feature of non-repeating extractors (not just for RegEx, but for all. For example, if there is a PO and "Total" is at the top and in the table of details, I only want it to extract once. So I can get "$3,005" and not $3,005,$3005" as a result.
The other option is to use Flow and do some post-processing to remove duplicates (I've done that for some clients). This takes extra effort but is very effective.
I like the idea in general as a feature of non-repeating extractors (not just for RegEx, but for all. For example, if there is a PO and "Total" is at the top and in the table of details, I only want it to extract once. So I can get "$3,005" and not $3,005,$3005" as a result.
The other option is to use Flow and do some post-processing to remove duplicates (I've done that for some clients). This takes extra effort but is very effective.