Forum Discussion
Wavel
Mar 10, 2021Copper Contributor
Word search through thousands of pdf's?
Is this the appropriate product to use if I want to create a word search index for thousands of pdf files, and then query it from my asp.net application?
Wavel
Mar 10, 2021Copper Contributor
Last time I checked though, the system only indexed the first 30k or so of each document. I need the entire pdf (10k-2MB) searchable. These are all text pdf's by the way. No images.
David_Bluebox
Mar 10, 2021Copper Contributor
I have similar requirements where a client subscribes to a number of Market Research (in education) sources that provide regular PDF's containing text and images. My client wishes to perform a search across those documents e.g. all research documents related to students coming from China. The documentation I have read seems to also refer to the text limit as mentioned by Wavel. How can we execute such an indexing operation? All of the files currently reside in SharePoint.
- Luis Cabrera-CordonMar 10, 2021Former Employee
Azure Cognitive Search should work for this.
If all your content is in SharePoint though I would first check if SharePoint search meets your needs.If you need additional flexibility that Azure Cognitive Search provides, you either could use the new SharePoint indexer (in preview: Configure a SharePoint Online indexer (preview) - Azure Cognitive Search | Microsoft Docs), or simply copy all your files to blob storage, and use the blob storage indexer. (Search over Azure Blob storage content - Azure Cognitive Search | Microsoft Docs)
I hope this was helpful,
Luis Cabrera, Azure Cognitive Search team.