Today, any organization who is ready to spin Azure service always curious of performance, load, design etc. The story is no different with data. With Azure Synapse DWH organization want to make sure they choose right skus and design to benefit customer and themselves. Any organization can come up with following questions
Are we choosing right sku for current and future workload?
Are we utilizing the resources well?
What about the latency?
Based on upcoming request, what will be the impact on allocated resources aka load testing?
How to make sure queries are performing well on DWH design?
How to build baseline matrix?
Well, one of the ways is to monitor (and setup alerts) the service on production environment and adjust the sku. However this may impact the user experience and become hard practice to follow. This blog post try to overcome such problems in advance to avoid any surprises on production environment. The focus will be on building a framework to simulate the load with multiple sql queries and get answer for above questions.
In this end to end architecture, we'll simulate the load from different regions with different queries. The outcome will help us understand if we choose right sku and setup good DWH design like table distribution, index, cache etc . Here is the reference architecture of the framework
All scripts and step by step guide can be found here. However, below are the steps to build the framework along with few pictures.