Forum Discussion
Is there any way at all to search for PDF files using PDF keywords in SharePoint Online?
- Oct 14, 2017
SharePoint can query the properties (i.e. metadata) of a document only if there is a document parser that "promotes" such properties when uploading the document.
Unfortunately, SPO does not implement out of the box a document parser for PDF files, hence the PDF properties are not "promoted" (i.e. they are ignored).
So, if you want to query PDF properties in SPO, you have to fill by yourself, manually or automatically, the corresponding columns on the document library where the PDF is stored.
See https://blogs.technet.microsoft.com/wbaer/2014/08/29/document-property-promotion-and-demotion-overview-and-considerations/
The PDF property keyword is not searchable on SharePoint Online. The only alternative is to use a custom solution (can be build in JavaScript) that extracts the keyword property value from PDF files and then captures the value into a SharePoint column. This allows use of the keywords value in searches but also in views.
Because it uses JavaScript it means it will also work on SharePoint Online and can be packaged in different ways (e.g. provider hosted app). Such a custom solution can read all the properties in PDF files like XMP fields. modification date and custom properties. As far as I know there are no free solutions that offer this capability. It would be beneficial to a wide audience because PDF is a common format.
Paul
> The only alternative is to use a custom solution (can be build in JavaScript) […]
I’m not quite familiar with the SharePoint Online architecture. I took a quick look at the article https://docs.microsoft.com/en-us/sharepoint/dev/general-development/sharepoint-add-ins-compared-with-sharepoint-solutions and my understanding is that my custom solution will have to either run in an active browser session or be hosted and run somewhere else, and either way it’s going to have to use the standard, client-facing APIs to fetch and parse the PDF files and update their columns. Is that right?