Forum Discussion
Wavel
Mar 10, 2021Copper Contributor
Word search through thousands of pdf's?
Is this the appropriate product to use if I want to create a word search index for thousands of pdf files, and then query it from my asp.net application?
Luis Cabrera-Cordon
Mar 10, 2021Former Employee
Yes, absolutely. You could put your thousands of pdf files in a repository like blob storage, then index it and then query for the information in those PDFs.
Note that there are many different types of PDFs. If you have scanned PDFs, you may want to add a skillset to your indexer that extracts text from the images embedded in the PDFs.
The easiest way to do all of this (in just a couple of minutes) is to follow this tutorial :
https://docs.microsoft.com/en-us/azure/search/cognitive-search-quickstart-blob
By the end of that quickstart, you will have an index that you can query so you can find any PDFs that meet the query requirements.
Thanks,
Luis Cabrera, Azure Cognitive Search Team
Wavel
Mar 10, 2021Copper Contributor
Last time I checked though, the system only indexed the first 30k or so of each document. I need the entire pdf (10k-2MB) searchable. These are all text pdf's by the way. No images.