Bulk updating performance
Hi,
I am currently working on a migration project from SQL server to SharePoint Lists and SharePoint document libraries. The SQL server store around 1.29 million (1.2 TB) files as filestream objects and are within a folder structure of approx 150 thousand folders.
I was able to extract the filestream object to HDD and use the SharePoint migration tool to upload the files to SharePoint libraries. This took approximately 10 hours.
Now, my next job is to update a custom attribute on each of the sub folders. For this I have written python code in Jupyter notebook. Here is a snippet -
I have made 10 copies of the Jupyter noteboo and split the load so that each notebook updates 15000 records each. The total time per notebook was around 400 minutes. The time was spent in the API call
ctxFolderItem.execute_query(). This was called once per record update. Is there an API to update records in a single call? The current performance is abysmal and I need a better approach as I next need to update 1.29 million files again updating some custom columns. If this was a migration from one DB to another, my entire migration would be completed in under 20 minutes. Is there a better approach that I could investigate?
Any feedback is appreciated.