Document Library Metadata from file contents automatically

New Contributor

One particular team creates 30-40 documents per day which need to be kept for inspection for up to 7 years and need to be fully auditable.  Each documents has fields within it relating to specific customer names, reference numbers as well as assets and locations.

I am hoping to find a way whereby, when that document is uploaded to one of our sharepoint documents libraries, metadata is populated directly from the document without additional user input.

 

Would the community help advise me on how to make this happen and any preparatory considerations I should consider.

 

Many thanks in advance

4 Replies

Hi @Noobie999 ,

 

are we talking about Word documents? In that case you can use document properties that sync themselves to columns on the library:

You need to create a document template with the necessary content controls first and then create your documents from that template.

1) Go to your document library and create the metadata column you want to extract (i.e. CustomerName as a TextField)
2) Upload a document you want to use as template to the document library
3) From there open the document in Word (not Word Online)
4) Go to the position in your document where you print your customer name
5) Mark that name and select Insert->Quick Parts->Document Properties->CustomerName

    All properties you added in Step 1 should also be available here.
6) Now instead of plain text you have a content control where you can enter data.
7) Save your document and download it
8) Go to "New" and click "+Add Template"
9) Upload your document

Now you can click "New-><YourDocumentName>" and a new document instance of your document template is created. In here you can enter the customer name into that content control.
On save this data is stored in the document library where you can filter by the metadata.

But this only works for documents that you create from that template in the future.

If you already have existing documents then you could think about SharePoint Syntex ("https://www.microsoft.com/en-us/microsoft-365/enterprise/microsoft-syntex") or a PowerAutomate flow that uses an AI model to extract information from your documents ( https://www.youtube.com/watch?v=PD2eKTzkZ70).
But for both you will most likely need additional licenses.

Best Regards,
Sven

 

@Noobie999 

If the metadata is located in the content then the quick parts approach from @SvenSieverding is the way forward.
If the metadata is present in the Office file's properties then you can leverage the OOTB property promotion mechanism to extract properties from Office files to SharePoint columns. The main constraint is that the property names need to match the SharePoint column names exactly.
You need to be aware that changing the SharePoint column values will also result in changes to the Office files (works bi-directional).

An alternative approach is to use tools (example) that can extract properties from Office files during uploading and capture the values into SharePoint columns. There is a mapping table so the names of the properties and SharePoint columns do not need to match. These tools are typically one-way: from the Office file to SharePoint column only.

Hi @SvenSieverding 

 

Thank you for what appears to be such a complete answer and I will certainly be giving this approach a go.  It may be a few days before I get my full had into this space, but when I do and can try to follow your suggestions, I will certainly come back with some feedback.

 

I really appreciate you taking the time to offer your advise and expertise  

Hi Paul,

It is really helpful to have another contributor concur with another responder - this gives me extra confidence that I am going to be on the right track.

Your extra comments on the bi-directionality between sharepoint metadata and document properties is really helpful - potentially not in this instance, which should be very simple as the documents are static once published, but definitely for some other projects I am considering.

Thank you for your contribution