Looking at Populating SharePoint document library metadata from pdf file

Copper Contributor

Looking to find a way whereby when a pdf document is uploaded to a SharePoint document library, metadata needs to be populated directly from the last page of the document without any user input.

 

Here is what I have so far,

Using Python tabula package to extract the metadata from pdf document into an excel file. 

Looking at populating the SharePoint document library metadata from this file.

 

Appreciate any answers and thoughts!!

Thanks, in advance!

4 Replies

@Venugs 

 

Are you storing excel file somewhere in SharePoint where you have already populated metadata?

@Venugs 

"metadata needs to be populated directly from the last page of the document without any user input."

Is the data you are interested in stored in the last page of the PDF document or is it stored in PDF properties (like Title, Author, Subject, Keywords, ...)?

In the latter case there are not many alternatives. The property promotion mechanism that allows for bi-directional transfer of metadata from Office files to SharePoint columns does not work for pdf files. There are tools (example) that support one-way sync (from pdf to SharePoint columns) during uploading. Not aware of any tools supporting bi-directional sync.

 

I have the excel file is stored in OneDrive
So, the data is extracted from the last page of the pdf file using python packages into an excel file stored in OneDrive.