Sep 29 2021 04:40 AM
We are scanning invoices and creating searchable pdfs. I've searched the contents of the invoices and they are searchable. However, when I upload them to the document library and search the document library for the contents, it doesn't find any results. Why not?
I've confirmed that the documenting library settings as shown in document https://docs.microsoft.com/en-us/sharepoint/troubleshoot/search/search-results-missing :
Any other ideas to what the problem could be?
Oct 03 2021 07:52 AM - edited Oct 03 2021 07:53 AM
HiYou speak about sharepoint online?
just to be sure, when you are on a library, in the searchbox, could you type
content:<a word present in your pdf> filetype:pdf
It will specifically look for the content in all the pdf file. If you see result, your pdf are well searchable and the issue is somewhere else.
Are you using information right management? I think that in that case, the content is not searchable YET ( something in the roadmap).
Oct 03 2021 08:12 AM
Thanks for answering. Using SharePoint online. We're not using information right management. What else could it be? I've even saved a word document as pdf and still cant search the contents.
Thanks
Oct 03 2021 08:40 AM
Could you go on the main page of sharepoint online (https:/:<yourdomain>.sharepoint.com and just try the search with the keyword "content:..." to see if it's a domain issue or a specific site issue?
If it has never work, i would advise to reach the microsoft support. On Server, it's often due to issue with the search crawl.
if you have the PNP powershell, you can look at the crawl log https://www.sharepointdiary.com/2019/07/get-search-crawl-log-in-sharepoint-online-using-powershell.h...
It's officialy not possible to start/restart a manual crawl in sharepoint online. I've seen some "hack" but never tested it.
Oct 04 2021 03:00 AM - edited Oct 04 2021 03:00 AM
Done the above and seen a few items in the log file. Nothing for the document library in-particular at https://companyname.sharepoint.com/Finance/company1_invoices
However, there is the entry below , will that be the scan for the whole of finance or is that only scanning the root directory not the document library "finance/company1_invoices"
"Url : https://companyname.sharepoint.com/Finance
CrawlTime : 03/10/2021 15:02:38
ItemTime : 01/01/0001 00:00:00
LogLevel : Success
Status :
ItemId : 11257
ContentSourceId : 1"
Oct 04 2021 03:29 AM
Hi
Seing your url, the "finance" stuff seems to be a subsite and not a separate website.
Could you go in the " https://companyname.sharepoint.com", press the cog wheel and check for the "search and offline availibility". If the search option is turn off, it's apply to the subsites.
Verify then all the settings on the parent sites before.
Small note, subsites are not recommended anymore, it's best (unless specific requirement) to create separate websites and link it to a hub ( also Hub of hubsite is currently in roll out)
Oct 04 2021 05:39 AM
Oct 07 2021 04:02 AM
Oct 07 2021 04:08 AM
@alex_k60 Glad you solve it. I didn't expect something in the OCR, i'm not familiar with PDF to be honest.
Concerning the property of the pdf, SHarepoint is able to find it.
For example, all the property are considered as metadata and if you have put your company as metadata, you can search for it and Sharepoint will give you data.