SharePoint Document-ID and Chinese characters braking search

MCT

Dec 01, 2025

This typically happens when the PDF iFilter / text extraction pipeline cannot normalize certain CJK characters consistently.

What you’re seeing is common:

Full-text search works (because the content extractor can parse the text).
But Document-ID search fails because the metadata indexer treats the extracted metadata as malformed when the original PDF contains mixed encodings.

A few things you can verify:

Ensure the PDF was OCR-processed using Unicode-compliant text layers. Some older PDF generators embed non-standard glyph maps.
Re-upload the file after re-processing the PDF. This forces SharePoint to rebuild the managed property and can fix indexing inconsistencies.
Check if Document ID is mapped to a retrievable managed property (ows_DocId). If the property fails ingestion for a specific file, SharePoint simply cannot match it.

If the issue only affects PDFs with Chinese characters, the root cause is usually the PDF encoding rather than SharePoint itself. Regenerating the PDF with a modern Unicode encoder almost always restores Document-ID search.

Forum Discussion

SharePoint Document-ID and Chinese characters braking search