Mar 15 2023 08:17 PM
Looking to find a way whereby when a pdf document is uploaded to a SharePoint document library, metadata needs to be populated directly from the last page of the document without any user input.
Here is what I have so far,
Using Python tabula package to extract the metadata from pdf document into an excel file.
Looking at populating the SharePoint document library metadata from this file.
Appreciate any answers and thoughts!!
Thanks, in advance!
Mar 16 2023 01:52 AM
Are you storing excel file somewhere in SharePoint where you have already populated metadata?
Mar 16 2023 02:57 AM
"metadata needs to be populated directly from the last page of the document without any user input."
Is the data you are interested in stored in the last page of the PDF document or is it stored in PDF properties (like Title, Author, Subject, Keywords, ...)?
In the latter case there are not many alternatives. The property promotion mechanism that allows for bi-directional transfer of metadata from Office files to SharePoint columns does not work for pdf files. There are tools (example) that support one-way sync (from pdf to SharePoint columns) during uploading. Not aware of any tools supporting bi-directional sync.
Mar 16 2023 05:51 AM
Mar 16 2023 05:53 AM