Azure Synapse Analyics
1 TopicForeach activity parallel runs turns into serial run from parallel after an hour+
This is more specific about "Foreach activity" code in azure synapse pipeline when setting is all set to have parallel run of activity inside, pipeline does good for some time as expected and in later times even though there are handful of task to be finished to run in parallel and slots are available too, it runs in series causing job to run longer. This issue can be reproduced, I see every time production job runs this way. There is very notable performance drag happening because of this in long running pipelines. here is scenario in detail. Customer has data in ADLS and running synapse dedicated db. Let's say for example there are 50 tables to be loaded and each table data is big and it takes 10 minutes to load each table (loaded through polybase copy activity). With synapse pipeline with "lookup activity" we are fetching all the details (like source file location, destination database-table details etc) and feeding in to "foreach activity" inside that copy/load is out there. we have set 'batch count = 5' (this number can be more too); sequential flag is turned OFF in foreach - so, 5 table load can happen in parallel. This way in 10 minutes we can finish 5 tables and it works well as expected for an hour or so, then something like reset happening in the internal scheduler of 'foreach activity' I think, foreach started scheduling the table loads in series causing long run in table loads always. please find the screenshot of it, gantt chart explains it well. In below job we can see for last 1 hour of run (Execute pipeline_deltaload) things ran in series whereas for previous 2 hours things ran in parallel. we could talk more if details needed, let me know what you folks think about this.3KViews1like3Comments