searchable pdfs not searchable within sharepoint document library

Copper Contributor

We are scanning invoices and creating searchable pdfs.  I've searched the contents of the invoices and they are searchable.   However, when I upload them to the document library and search the document library for the contents, it doesn't find any results.  Why not?

 

I've confirmed that the documenting library settings as shown in document https://docs.microsoft.com/en-us/sharepoint/troubleshoot/search/search-results-missing  :

 

  • Allow this site to appear in Search results is set to Yes.
  • Allow items from this document library to appear in search results
  • Any user who can read draft items
  • Allow this site to appear in Search results

Any other ideas to what the problem could be? 

 

 

 

8 Replies

@alex_k60 

HiYou speak about sharepoint online?

just to be sure, when you are on a library, in the searchbox, could you type 

 

content:<a word present in your pdf> filetype:pdf

 

It will specifically look for the content in all the pdf file. If you see result, your pdf are well searchable and the issue is somewhere else.

Are you using information right management? I think that in that case, the content is not searchable YET ( something in the roadmap).

 

@Vertebre85 

Thanks for answering.  Using SharePoint online.  We're not using information right management.  What else could it be?  I've even saved a word document as pdf and still cant search the contents.  

 

Thanks 

@alex_k60 

Could you go on the main page of sharepoint online (https:/:<yourdomain>.sharepoint.com and just try the search with the keyword "content:..." to see if it's a domain issue or a specific site issue?

 

If it has never work, i would advise to reach the microsoft support. On Server, it's often due to issue with the search crawl.

 

if you have the PNP powershell, you can look at the crawl log https://www.sharepointdiary.com/2019/07/get-search-crawl-log-in-sharepoint-online-using-powershell.h...

It's officialy not possible to start/restart a manual crawl in sharepoint online. I've seen some "hack" but never tested it.

Done the above and seen a few items in the log file. Nothing for the document library in-particular at https://companyname.sharepoint.com/Finance/company1_invoices

However, there is the entry below , will that be the scan for the whole of finance or is that only scanning the root directory not the document library "finance/company1_invoices"


"Url : https://companyname.sharepoint.com/Finance
CrawlTime : 03/10/2021 15:02:38
ItemTime : 01/01/0001 00:00:00
LogLevel : Success
Status :
ItemId : 11257
ContentSourceId : 1"

@alex_k60 

Hi
Seing your url, the "finance" stuff seems to be a subsite and not a separate website.
Could you go in the " https://companyname.sharepoint.com", press the cog wheel and check for the "search and offline availibility". If the search option is turn off, it's apply to the subsites.

Vertebre85_0-1633343280114.png

Verify then all the settings on the parent sites before.

Small note, subsites are not recommended anymore, it's best (unless specific requirement) to create separate websites and link it to a hub ( also Hub of hubsite is currently in roll out)

 

@Vertebre85 

 

all sites are enabled :

 

alex_k60_0-1633351112134.png

 

 

Thanks for the tips on subsites :) 

OCR is now working within SharePoint. There was a search option within the scanning software that I enabled. The confusing thing about this is that before the option was enabled, I was able to search the pdf for content using foxit. The other weird problem is that even after the option was enabled I could search in the sharepoint search box for say "ABC company" and it would find text but if I used "content: ABC company filetype:pdf" it wont find anything.

@alex_k60 Glad you solve it. I didn't expect something in the OCR, i'm not familiar with PDF to be honest.

Concerning the property of the pdf, SHarepoint is able to find it.

Vertebre85_0-1633604867694.png

 

For example, all the property are considered as metadata and if you have put your company as metadata, you can search for it and Sharepoint will give you data.