Forum Discussion

Copper Contributor

Mar 10, 2021

Word search through thousands of pdf's?

Is this the appropriate product to use if I want to create a word search index for thousands of pdf files, and then query it from my asp.net application?

Wavel

Copper Contributor

Mar 10, 2021

Last time I checked though, the system only indexed the first 30k or so of each document. I need the entire pdf (10k-2MB) searchable. These are all text pdf's by the way. No images.

David_Bluebox

Copper Contributor

Mar 10, 2021

I have similar requirements where a client subscribes to a number of Market Research (in education) sources that provide regular PDF's containing text and images. My client wishes to perform a search across those documents e.g. all research documents related to students coming from China. The documentation I have read seems to also refer to the text limit as mentioned by Wavel. How can we execute such an indexing operation? All of the files currently reside in SharePoint.

Luis Cabrera-Cordon
Former Employee
Mar 10, 2021
David_Bluebox

Azure Cognitive Search should work for this.
If all your content is in SharePoint though I would first check if SharePoint search meets your needs.

If you need additional flexibility that Azure Cognitive Search provides, you either could use the new SharePoint indexer (in preview: Configure a SharePoint Online indexer (preview) - Azure Cognitive Search | Microsoft Docs), or simply copy all your files to blob storage, and use the blob storage indexer. (Search over Azure Blob storage content - Azure Cognitive Search | Microsoft Docs)

I hope this was helpful,

Luis Cabrera, Azure Cognitive Search team.