Looking at Populating SharePoint document library metadata from pdf file

Venugs · ‎Mar 15 2023

Looking to find a way whereby when a pdf document is uploaded to a SharePoint document library, metadata needs to be populated directly from the last page of the document without any user input.

Here is what I have so far,

Using Python tabula package to extract the metadata from pdf document into an excel file.

Looking at populating the SharePoint document library metadata from this file.

Appreciate any answers and thoughts!!

Thanks, in advance!

kalpeshvaghela · ‎Mar 16 2023

@Venugs

Are you storing excel file somewhere in SharePoint where you have already populated metadata?

Paul de Jong · ‎Mar 16 2023

@Venugs

"metadata needs to be populated directly from the last page of the document without any user input."

Is the data you are interested in stored in the last page of the PDF document or is it stored in PDF properties (like Title, Author, Subject, Keywords, ...)?

In the latter case there are not many alternatives. The property promotion mechanism that allows for bi-directional transfer of metadata from Office files to SharePoint columns does not work for pdf files. There are tools (example) that support one-way sync (from pdf to SharePoint columns) during uploading. Not aware of any tools supporting bi-directional sync.

Venugs · ‎Mar 16 2023

I have the excel file is stored in OneDrive

Venugs · ‎Mar 16 2023

So, the data is extracted from the last page of the pdf file using python packages into an excel file stored in OneDrive.

Looking at Populating SharePoint document library metadata from pdf file

Looking at Populating SharePoint document library metadata from pdf file

Re: Looking at Populating SharePoint document library metadata from pdf file

Re: Looking at Populating SharePoint document library metadata from pdf file

Re: Looking at Populating SharePoint document library metadata from pdf file

Re: Looking at Populating SharePoint document library metadata from pdf file

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Looking at Populating SharePoint document library metadata from pdf file

Looking at Populating SharePoint document library metadata from pdf file

Re: Looking at Populating SharePoint document library metadata from pdf file

Re: Looking at Populating SharePoint document library metadata from pdf file

Re: Looking at Populating SharePoint document library metadata from pdf file

Re: Looking at Populating SharePoint document library metadata from pdf file