AI to calculate content value score, 0 for low value test content and 1 for high value content

We have a large number of documents held in M365 SharePoint. By large, I mean document count is 10s of millions. We need to delete documents that are really old and has low value. Given the volume of data, I was wondering if AI could help here. E.g. if AI could separate junk docs from actual real data. A junk doc is something that has 'testing testing' or 'Lorem ipsum'


May be an AI service to place a score on each document, 0 for very low value (has Lorem ipsum) and 1 for high value (has certain keywords such as customer names  etc). Could anyone please given any pointers about such a service?

