Forum Discussion
prathyoo
Jan 28, 2024Copper Contributor
Bulk updating performance
Hi,
I am currently working on a migration project from SQL server to SharePoint Lists and SharePoint document libraries. The SQL server store around 1.29 million (1.2 TB) files as filestream objects and are within a folder structure of approx 150 thousand folders.
I was able to extract the filestream object to HDD and use the SharePoint migration tool to upload the files to SharePoint libraries. This took approximately 10 hours.
Now, my next job is to update a custom attribute on each of the sub folders. For this I have written python code in Jupyter notebook. Here is a snippet -
I have made 10 copies of the Jupyter noteboo and split the load so that each notebook updates 15000 records each. The total time per notebook was around 400 minutes. The time was spent in the API call
ctxFolderItem.execute_query(). This was called once per record update. Is there an API to update records in a single call? The current performance is abysmal and I need a better approach as I next need to update 1.29 million files again updating some custom columns. If this was a migration from one DB to another, my entire migration would be completed in under 20 minutes. Is there a better approach that I could investigate?
Any feedback is appreciated.
- That's because you're being throttled. Check this to mitigate this:
https://learn.microsoft.com/en-us/sharepoint/dev/general-development/how-to-avoid-getting-throttled-or-blocked-in-sharepoint-online
Also make use of Graph API to preform this task- prathyooCopper Contributor
I switched execute_query() to execute_query_with_incremental_retry() but that doesn't seem to have improved performance. I noticed that the python library also has an execute_batch function. But I have not been able to figure out the structure of the list or dict that needs to be the input to the load function preceding the call to execute_batch().