Forum Discussion
Synapse Lake Database vs Synapse Serverless SQL Database
https://www.clearpeaks.com/choosing-your-data-warehouse-on-azure-synapse-dedicated-sql-pool-vs-synapse-serverless-sql-pool-vs-azure-sql-database-part-2/
and here at Microsoft Docs :
https://docs.microsoft.com/en-us/azure/synapse-analytics/metadata/database
- Geoff_RobinsonAug 30, 2022Copper ContributorThanks for the links. I'm also trying to understand whether these two options (SQL defined vs Spark defined tables using the Serverless SQL Pool) are just two flavors of the same product or fundamentally different. Are they likely to perform differently? The system I'm building has many thousands of small parquet files in the Datalake and so I'd like to understand which querying method will work better and is there likely to be a cost difference?
- TidyPyAiSep 28, 2022Copper Contributor
My functional understanding:
1. Lake Database allows you to use the Synapse GUI aka Databricks to create and ETL tables easily. You will not have certain CTAS T-SQL abilities. The GUI allows you to Debug; and when all is vetted; it is extremely likely to work proper. As you are coached into all the tabs from left to right.
Pairs nice with Dataverse and Low Code ML. It did take me much longer to learn the GUI and get a table done, then SQL.
2. SQL is a Dedicated SQL Pool complete with most traditional capability; setting INDEX, Schema, Creating API users, DB ROLES (MS-Purview Setup), etc... all with Syntax. If you have a SQL Dev; this will be easy for them.
Concerning developers; you can connect to either Lake or Dedicated pool with visual studio and use most 'explorer' type software (SSMS) or (GOAT).
the security handling is more traditional with Dedicated SQL.
I find it far faster and easier to paste or write a SQL Script then to use the GUI.
Performance wise; its out of scope but both depend on configuration. I see no clear winner in a vacuum environment.
Cost wise; is OOS in a vacuum environment.
I think a mixed environment such as S3 would be more flexible from a Dedicated SQL DB. As PySpark, notebooks, etc... will use the Dedicated Pools but I have not Developed under Lake Data yet to know if there are any game changers or hang-ups.
Caveat: if doing BI Reporting and you have Dataverse; and will do Machine Learning with the data, and its source is an integrated data set or Linked data set (like Sharepoint tables or D365) then when you are done with creating the "Lake Database table"; Low Code Machine learning is R-Click away.
I tend to dump large datasets in SQL Dedicated, create a few small sets of "Lake data'' then R-Click and 'predict with a model'.