Managed and External table on Serverless

Published Mar 16 2021 03:56 AM 3,382 Views
Microsoft

The purpose of this about is to discuss Managed and External tables while querying from SQL On-demand or Serverless.

Thanks to my colleague Dibakar Dharchoudhury for the really nice discussion related to this subject.

 

By the docs: Shared metadata tables - Azure Synapse Analytics | Microsoft Docs

 

  • Managed tables

Spark provides many options for how to store data in managed tables, such as TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE, DELTA, and LIBSVM. These files are normally stored in the warehouse directory where managed table data is stored.

  • External tables

Spark also provides ways to create external tables over existing data, either by providing the LOCATION option or using the Hive format. Such external tables can be over a variety of data formats, including Parquet.

Azure Synapse currently only shares managed and external Spark tables that store their data in Parquet format with the SQL engines

 

Note "The Spark created, managed, and external tables are also made available as external tables with the same name in the corresponding synchronized database in serverless SQL pool."

 

Following examples of Managed and External Tables created on Spark:

 

1) Authentication:

 

 

 

 

blob_account_name = "StorageAccount"
blob_container_name = "ContainerName"
from pyspark.sql import SparkSession

sc = SparkSession.builder.getOrCreate()
token_library = sc._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
blob_sas_token = token_library.getConnectionString("LInkedServerName")

spark.conf.set(
    'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
    blob_sas_token)

 

 

 

 

 

Note my linked Server Configuration:

LInked_server.png

 

2) Managed table: <I updated this inf. after the post.>

 

 

 

 

 

Spark.sql('CREATE DATABASE IF NOT EXISTS SeverlessDB')

#THE BELOW MANAGED SPARK TABLE 
filepath ='wasbs://Container@StorageAccount.blob.core.windows.net/parquets/file.snappy.parquet'
df = spark.read.load(filepath, format='parquet')
df.write.mode('overwrite').saveAsTable('SeverlessDB.ManagedTtable')



 

 

 

 

 

3) I can also create a managed table as parquet using the same dataset that I used for the previous one as follows:

 

#Managed - table
df.write.format("Parquet").saveAsTable("SeverlessDB.ManagedTable")

Query from Serverless:

serverless.png

 

 

Following the documentation. This is another way to achieve the same result for the managed table, however in this case the table will be empty:

 

CREATE TABLE SeverlessDB.myparquettable(id int, name string, birthdate date) USING Parquet

 

4) External table:

 

CREATE TABLE SeverlessDB.myexternalparquettable
    USING Parquet
    LOCATION 'wasbs://container@StorageAccount.blob.core.windows.net/parquets/file.snappy.parquet'

Those are the commands supported to create managed and external tables on Spark per doc. that would be possible to query on SQL Serverless.

 

 

If you check the path where your managed table was created you will be able to see under the Data lake as follows.  For example, my workspace name is synapseworkspace12:

 

tables_lin.png

 

5) You can use describe command to check about the type of your tables. For example:

 

DESCRIBE FORMATTED SeverlessDB.myexternalparquettable

external.png 

 

If you want to clean up this lab - Spark SQL:

 

-- Drop the database and it's tables
DROP DATABASE SeverlessDB CASCADE

 

 

That is it!

 

Liliam 

UK Engineer

Co-Authors
Version history
Last update:
‎Sep 15 2021 12:05 PM
Updated by: