Forum Discussion
Wavel
Mar 10, 2021Copper Contributor
Word search through thousands of pdf's?
Is this the appropriate product to use if I want to create a word search index for thousands of pdf files, and then query it from my asp.net application?
David_Bluebox
Mar 10, 2021Copper Contributor
I have similar requirements where a client subscribes to a number of Market Research (in education) sources that provide regular PDF's containing text and images. My client wishes to perform a search across those documents e.g. all research documents related to students coming from China. The documentation I have read seems to also refer to the text limit as mentioned by Wavel. How can we execute such an indexing operation? All of the files currently reside in SharePoint.
Luis Cabrera-Cordon
Mar 10, 2021Former Employee
Azure Cognitive Search should work for this.
If all your content is in SharePoint though I would first check if SharePoint search meets your needs.
If you need additional flexibility that Azure Cognitive Search provides, you either could use the new SharePoint indexer (in preview: Configure a SharePoint Online indexer (preview) - Azure Cognitive Search | Microsoft Docs), or simply copy all your files to blob storage, and use the blob storage indexer. (Search over Azure Blob storage content - Azure Cognitive Search | Microsoft Docs)
I hope this was helpful,
Luis Cabrera, Azure Cognitive Search team.