Forum Discussion

SKDutta's avatar
SKDutta
Copper Contributor
Nov 03, 2021

Is general purpose serverless tier of Azure SQL DB good choice for data warehouse database ?

 

Hi,

Is General purpose serverless tier of Azure SQL DB good choice for data warehouse database ?

The ETL is impleted by data factory mapping dataflows. 

We are getting frequently and randomly connection timeout ( read failed) error. 

It seems IO properties are very less. 

 

 

  • What we typically do for these systems is use a DTU-based Azure SQL DB. Keep it at S2 instead of using serverless, then as part of your loading process, scale it up to whatever level is needed (only takes a minute or two), do the work, then scale it back down. Don't go below S2 though, or you will incur a full database copy when scaling.
  • You should do workload testing and based on that you can choose the service tier which suits best for you.
    If you do more IO intensive operations on database then choosing Business Critical or Premium Service tier as it will have Local SSD where as for GP its remote storage.
    Refer below article for more details.
    https://docs.microsoft.com/en-us/azure/azure-sql/database/high-availability-sla#basic-standard-and-general-purpose-service-tier-locally-redundant-availability

    • SKDutta's avatar
      SKDutta
      Copper Contributor
      Thanks for your reply. Nagesh_g
      We are loading around 28 GB data from multiple soure systems every day into data warehouse. We have datawarehouse stage, prep, marts all the three layers/schemas in a single datebase in Azure SQL DB. So read and write from same databases using datafactory mapping dataflows. We have tried with 10 vCore to 40 vCore of serverless. But we have received connection timeout(read) error most of the time. We have many mappings running concurrently. So when few large mappings are running/update statement is running then Log IO and Data IO are reaching 100 % and some other mapping shows this error. Do you feel that general purpose serverless would be good ? What kind of addional workloading testing do you suggest ?
      • Hi SKDutta 

         

        > But we have received connection timeout(read) error most of the time.

        Serverless automatically scales compute based on workload demand. It's also automatically pauses databases during inactive periods. This directly can lead to such issues of timeout when you need to wake the server or scale the server.

        You can configure the "Auto-pause delay" to reduce such issue. Another option is to disabled the auto-pausing but this will probably means that serverless does not fit your scenario and you are not using the full power of serverless flexibility.

        If you cannot permit your system to have such issue then maybe serverless is not the solution that fit your need. You need a solution which stay awake all the time with the same recourses.

         

        >  So when few large mappings are running/update statement is running then Log IO and Data IO are reaching 100 % and some other mapping shows this error.

         

        According to this, the issue is not with "Auto-pause delay" since you speak about a case that your database is running already. It can still be related to scaling issue even so my "feeling" is that this not the case.

         

        > Do you feel that general purpose serverless would be good ?

         

        I am "feeling" that we should not guess here and we do not have the necessarily information to make the decision. workload testing as Nagesh_g suggested is always a good idea in order to choose the right recourses.

         

        You need to monitor what is your bottleneck. For example if the issue is low memory then you might want to use M-Series which is Memory optimized. Use FSv2-series for Compute optimized (issue with IO). Note that these tiers require that you use the business critical service tier.

         

        In fact, my first thought was that maybe you should use azure synapse dedicated, which also known in the former name Azure SQL Data Warehouse.

         

        At this time, I cannot tell you which service and which tier fit for your need, but you should definitely think about (meaning testing as well) using azure synapse dedicated. In order to help you choose the best solution we will need more information about the how you use the data, what are the latency which you can allow, and some benchmark test.

Resources