Forum Discussion
guidopeter
Nov 12, 2025Copper Contributor
SharePoint Document-ID and Chinese characters braking search
I am struggling with a strange phenomenon. We use the Document-ID on SharePoint extensively, especially for searching. So we search specifically for SharePoint Document-IDs, e.g., EMTS-1223334444-123...
rogerval
Dec 02, 2025MCT
This typically happens when the PDF iFilter / text extraction pipeline cannot normalize certain CJK characters consistently.
What you’re seeing is common:
- Full-text search works (because the content extractor can parse the text).
- But Document-ID search fails because the metadata indexer treats the extracted metadata as malformed when the original PDF contains mixed encodings.
A few things you can verify:
- Ensure the PDF was OCR-processed using Unicode-compliant text layers. Some older PDF generators embed non-standard glyph maps.
- Re-upload the file after re-processing the PDF. This forces SharePoint to rebuild the managed property and can fix indexing inconsistencies.
- Check if Document ID is mapped to a retrievable managed property (ows_DocId). If the property fails ingestion for a specific file, SharePoint simply cannot match it.
If the issue only affects PDFs with Chinese characters, the root cause is usually the PDF encoding rather than SharePoint itself. Regenerating the PDF with a modern Unicode encoder almost always restores Document-ID search.