Forum Discussion

SGRoborana's avatar
SGRoborana
Copper Contributor
Jun 10, 2026

Date extraction regression: 2025-05-01-preview vs 2025-11-01 (GA) in Azure Content Understanding

Issue: When using the documentFieldExtraction scenario in Azure Content Understanding, the GA version (2025-11-01) produces significantly worse results compared to the preview version (2025-05-01-preview) for date field extraction on scanned (Dutch) medical documents.

Observed behavior:

  • With 2025-05-01-preview: all date fields are extracted correctly, including dates that are split across three separate handwritten fields (day, month, year).
  • With 2025-11-01 (GA): multiple date fields are either not found, returned as null, or extracted with day and year swapped (e.g. 2027-12-01 instead of 2001-12-27).

Document characteristics:

  • Scanned PDF (not native digital)
  • Dutch language
  • 4-16 pages
  • Dates are handwritten and split across three separate labeled fields: dag (day), maand (month), jaar (year)
  • Year is written as 2 digits (e.g. "26" for 2026, "01" for 2001)

Schema used: documentFieldExtraction with type: date fields and explicit descriptions instructing the model to read day → month → year in order and expand 2-digit years to 4 digits.

Expected behavior: The GA version should perform at least on par with the preview version when using the exact same prompts. Is this a known regression? Any recommended workarounds while waiting for a fix?

No RepliesBe the first to reply