SharePoint Document-ID and Chinese characters braking search

Question

I am struggling with a strange phenomenon. We use the Document-ID on SharePoint extensively, especially for searching. So we search specifically for SharePoint Document-IDs, e.g., EMTS-1223334444-123, to find the document.

This works fine with 10 million documents, except for PDF documents with Chinese characters. I can search for phrases in the text content or other properties, and everything is found. Only the search by Document-ID does not work.

virendrak · Answer

When searching by Document ID, please search against the managed property DlcDocId, it works even for PDFs with Chinese characters

Use the property DlcDocId in your query:

DlcDocId:EMTS-1223334444-123

Please refer to below articles for more details:

How to Search by Document ID in SharePoint | Blog about anything related to my learnings

Search doesn't provide results from another language - SharePoint | Microsoft Learn

rogerval · Answer

This typically happens when the PDF iFilter / text extraction pipeline cannot normalize certain CJK characters consistently.What you’re seeing is common:Full-text search works (because the content extractor can parse the text).But Document-ID search fails because the metadata indexer treats the extracted metadata as malformed when the original PDF contains mixed encodings.A few things you can verify:Ensure the PDF was OCR-processed using Unicode-compliant text layers. Some older PDF generators embed non-standard glyph maps.Re-upload the file after re-processing the PDF. This forces SharePoint to rebuild the managed property and can fix indexing inconsistencies.Check if Document ID is mapped to a retrievable managed property (ows_DocId). If the property fails ingestion for a specific file, SharePoint simply cannot match it.If the issue only affects PDFs with Chinese characters, the root cause is usually the PDF encoding rather than SharePoint itself. Regenerating the PDF with a modern Unicode encoder almost always restores Document-ID search.

Forum Discussion

SharePoint Document-ID and Chinese characters braking search

2 Replies