How to import a large dataset into a SharePoint list using azure webjobs

%3CLINGO-SUB%20id%3D%22lingo-sub-152238%22%20slang%3D%22en-US%22%3EHow%20to%20import%20a%20large%20dataset%20into%20a%20SharePoint%20list%20using%20azure%20webjobs%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-152238%22%20slang%3D%22en-US%22%3E%3CP%3EI%20need%20to%20import%20200%2C000%20rows%20into%20a%20SharePoint%20online%20list.%20The%20rows%20currently%20exist%20in%20an%20access%20database.%20This%20process%20needs%20to%20be%20run%20several%20times%20per%20year%20by%20an%20end%20user%20(uploading%20a%20large%20list%20to%20a%20different%20site).%20When%20I%20try%20to%20export%20from%20excel%20to%20%3CSPAN%3ESharePoint%26nbsp%3B%3C%2FSPAN%3Ethe%20export%20runs%20for%20hours%20and%20then%20typically%20fails%20with%20and%20%E2%80%98out%20of%20resources%E2%80%99%20exception.%20Also%2C%20one%20of%20the%20columns%20in%20the%20access%20database%20is%20an%20email%20address%2C%20which%20needs%20to%20be%20converted%20to%20a%20%E2%80%98person%20or%20group%E2%80%99%20column%20in%20SharePoint%2C%20so%20the%20access%20export%20won%E2%80%99t%20really%20give%20me%20what%20I%20need%20anyway.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ESo%20I%20exported%20the%20data%20from%20Access%20to%20a%20csv%2C%20and%20created%20an%20SPFX%2Fpnp-js-core%20application%20that%20parses%20the%20CSV%20and%20uploads%20the%20data%20to%20the%20SharePoint%20list%20while%20converting%20the%20email%20address%20to%20a%20%E2%80%98person%20or%20group%E2%80%99%20column.%20This%20works%2C%20but%20it%20takes%20many%20hours%20to%20upload%20200%2C000%20rows.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EIt%20was%20suggested%20that%20I%20upload%20the%20CSV%20files%20to%20SharePoint%2C%20and%20then%20have%20an%20azure%20webjob%20parse%20the%20CSV%20and%20save%20the%20rows%20in%20SharePoint.%20%26nbsp%3BI%20created%20the%20webjob%20%26nbsp%3Bthat%20reads%20the%20200%2C000%20lines%20from%20the%20csv%20file%2C%20and%20saves%20the%20rows%20in%20SharePoint.%20The%20webjob%20is%20run%20based%20on%20a%20message%20being%20added%20to%20a%20queue%2C%20I%20found%20that%205%20minutes%20after%20the%20webjob%20has%20started%2C%20a%20second%20instance%20of%20the%20webjob%20starts%2C%20and%20it%20too%20reads%20the%20lines%20from%20the%20csv%20and%20starts%20adding%20rows%20to%20SharePoint%20.%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ESo%20my%20questions%20are%3A%20How%20do%20I%20stop%20that%20second%20instance%20of%20the%20webjob%20from%20starting%3F%26nbsp%3B%20I%20have%20set%20the%20isSingleton%20to%20true%2C%20but%20that%20did%20not%20help.%20There%20must%20be%20some%20flag%20that%20says%20how%20long%20a%20webjob%20should%20be%20allowed%20to%20run%20for.%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ESecondly%20%2C%20is%20this%20the%20right%20approach%3F%20Or%20should%20the%20first%20webjob%20just%20parse%20the%20csv%20and%20write%20the%20rows%20out%20to%20a%20queue%2C%20and%20then%20have%20a%20second%20webjob%20that%20listens%20on%20that%20queue%20and%20does%20the%20insert%20into%20SharePoint.%20It%20seems%20this%20might%26nbsp%3B%20be%20more%20in%20keeping%20with%20the%20architecture%20of%20azure%20webjobs%2C%20but%20it%20will%20likely%20be%20super%20slow%20as%20the%20second%20webjob%20will%20need%20to%20connect%20to%20SharePoint%20separately%20for%20each%20of%20the%20200%2C000%20rows%20to%20be%20inserted.%26nbsp%3B%20Are%20webjobs%20even%20the%20right%20toll%20for%20this%3F%26nbsp%3B%20(would%20Logic%20Apps%2C%20Azure%20FUnctions%2C%20or%20azure%20Batch%20be%20better%3F)%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EI%20am%20new%20to%20azure.%20It%20seems%20like%20this%20must%20be%20a%20common%20use%20case.%20Is%20there%20any%20guidance%20available%20on%20how%20to%20do%20it.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-169457%22%20slang%3D%22en-US%22%3ERe%3A%20How%20to%20import%20a%20large%20dataset%20into%20a%20SharePoint%20list%20using%20azure%20webjobs%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-169457%22%20slang%3D%22en-US%22%3E%3CP%3EMake%20sure%20that%20you%20create%20indexes%20before%20you%20get%20to%205%2C000%20items%20and%20if%20you%20get%20to%2020%2C000%20items%20you%20won't%20be%20able%20to%20add%20new%20indexes%20at%20all.%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-159180%22%20slang%3D%22en-US%22%3ERe%3A%20How%20to%20import%20a%20large%20dataset%20into%20a%20SharePoint%20list%20using%20azure%20webjobs%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-159180%22%20slang%3D%22en-US%22%3EI%20would%20tackle%20this%20with%20Microsoft%20Flow.%3CBR%20%2F%3E%3CBR%20%2F%3EBuilt%20in%20Excel%2C%20CSV%20connectors.%3CBR%20%2F%3EParallel%20(up%20to%2050)%20for-each%20construct%20(up%20to%205000%20per%20for-each%2C%20we'll%20have%20to%20process%20the%20file%20in%20chunks).%3CBR%20%2F%3EPause%2C%20resume%2C%20error-retry.%3CBR%20%2F%3EEmail%20field%20conversion.%3CBR%20%2F%3ESchedule%20trigger%20or%20File%20upload%20trigger%20or%20Message%20Queue.%20Trigger%20restriction%20on%20concurrency%3A%201%3CBR%20%2F%3E%3CBR%20%2F%3EThough%2C%20I'm%20more%20likely%20to%20have%20the%20first%20part%20of%20the%20Flow%20split%20the%20CSV%20into%205000-row%20chunks.%20Place%20them%20into%20a%20separate%20queue.%3CBR%20%2F%3E%3CBR%20%2F%3EThen%20have%20a%20second%20Flow%20watching%20the%20queue%20to%20fire%20off%20and%20insert%20rows%20in%20parallel%2C%20per%20file.%3C%2FLINGO-BODY%3E
Highlighted
Frequent Contributor

I need to import 200,000 rows into a SharePoint online list. The rows currently exist in an access database. This process needs to be run several times per year by an end user (uploading a large list to a different site). When I try to export from excel to SharePoint the export runs for hours and then typically fails with and ‘out of resources’ exception. Also, one of the columns in the access database is an email address, which needs to be converted to a ‘person or group’ column in SharePoint, so the access export won’t really give me what I need anyway.

 

So I exported the data from Access to a csv, and created an SPFX/pnp-js-core application that parses the CSV and uploads the data to the SharePoint list while converting the email address to a ‘person or group’ column. This works, but it takes many hours to upload 200,000 rows.

 

It was suggested that I upload the CSV files to SharePoint, and then have an azure webjob parse the CSV and save the rows in SharePoint.  I created the webjob  that reads the 200,000 lines from the csv file, and saves the rows in SharePoint. The webjob is run based on a message being added to a queue, I found that 5 minutes after the webjob has started, a second instance of the webjob starts, and it too reads the lines from the csv and starts adding rows to SharePoint . 

 

So my questions are: How do I stop that second instance of the webjob from starting?  I have set the isSingleton to true, but that did not help. There must be some flag that says how long a webjob should be allowed to run for. 

 

Secondly , is this the right approach? Or should the first webjob just parse the csv and write the rows out to a queue, and then have a second webjob that listens on that queue and does the insert into SharePoint. It seems this might  be more in keeping with the architecture of azure webjobs, but it will likely be super slow as the second webjob will need to connect to SharePoint separately for each of the 200,000 rows to be inserted.  Are webjobs even the right toll for this?  (would Logic Apps, Azure FUnctions, or azure Batch be better?)

 

I am new to azure. It seems like this must be a common use case. Is there any guidance available on how to do it.

 

 

 

 

2 Replies
Highlighted
I would tackle this with Microsoft Flow.

Built in Excel, CSV connectors.
Parallel (up to 50) for-each construct (up to 5000 per for-each, we'll have to process the file in chunks).
Pause, resume, error-retry.
Email field conversion.
Schedule trigger or File upload trigger or Message Queue. Trigger restriction on concurrency: 1

Though, I'm more likely to have the first part of the Flow split the CSV into 5000-row chunks. Place them into a separate queue.

Then have a second Flow watching the queue to fire off and insert rows in parallel, per file.
Highlighted

Make sure that you create indexes before you get to 5,000 items and if you get to 20,000 items you won't be able to add new indexes at all.