Forum Discussion
SGRoborana
Jun 10, 2026Copper Contributor
Date extraction regression: 2025-05-01-preview vs 2025-11-01 (GA) in Azure Content Understanding
Issue: When using the documentFieldExtraction scenario in Azure Content Understanding, the GA version (2025-11-01) produces significantly worse results compared to the preview version (2025-05-01-preview) for date field extraction on scanned (Dutch) medical documents.
Observed behavior:
- With 2025-05-01-preview: all date fields are extracted correctly, including dates that are split across three separate handwritten fields (day, month, year).
- With 2025-11-01 (GA): multiple date fields are either not found, returned as null, or extracted with day and year swapped (e.g. 2027-12-01 instead of 2001-12-27).
Document characteristics:
- Scanned PDF (not native digital)
- Dutch language
- 4-16 pages
- Dates are handwritten and split across three separate labeled fields: dag (day), maand (month), jaar (year)
- Year is written as 2 digits (e.g. "26" for 2026, "01" for 2001)
Schema used: documentFieldExtraction with type: date fields and explicit descriptions instructing the model to read day → month → year in order and expand 2-digit years to 4 digits.
Expected behavior: The GA version should perform at least on par with the preview version when using the exact same prompts. Is this a known regression? Any recommended workarounds while waiting for a fix?
No RepliesBe the first to reply