Forum Discussion

SGRoborana's avatar
SGRoborana
Copper Contributor
Jun 10, 2026

Date extraction regression: 2025-05-01-preview vs 2025-11-01 (GA) in Azure Content Understanding

Issue: When using the documentFieldExtraction scenario in Azure Content Understanding, the GA version (2025-11-01) produces significantly worse results compared to the preview version (2025-05-01-preview) for date field extraction on scanned (Dutch) medical documents.

Observed behavior:

  • With 2025-05-01-preview: all date fields are extracted correctly, including dates that are split across three separate handwritten fields (day, month, year).
  • With 2025-11-01 (GA): multiple date fields are either not found, returned as null, or extracted with day and year swapped (e.g. 2027-12-01 instead of 2001-12-27).

Document characteristics:

  • Scanned PDF (not native digital)
  • Dutch language
  • 4-16 pages
  • Dates are handwritten and split across three separate labeled fields: dag (day), maand (month), jaar (year)
  • Year is written as 2 digits (e.g. "26" for 2026, "01" for 2001)

Schema used: documentFieldExtraction with type: date fields and explicit descriptions instructing the model to read day → month → year in order and expand 2-digit years to 4 digits.

Expected behavior: The GA version should perform at least on par with the preview version when using the exact same prompts. Is this a known regression? Any recommended workarounds while waiting for a fix?

1 Reply

  • If the same samples worked before and now fail consistently, I would treat this as either a model behavior change or a service regression and open a support case with the project, model deployment, region, request time, and a few minimal sample documents.

     

    As a workaround, I would make the output schema more defensive. Ask for the raw date string exactly as seen on the invoice, then normalize it in your own code with the expected locale and date format rules. Invoice dates are a place where relying on the model to infer the final normalized date can be risky, especially when day and month order may be ambiguous.