Blog Post

Azure Synapse Analytics Blog
1 MIN READ

Use Spark (Scala) to write data from ADLS to Synapse Dedicated Pool

Mukund_Bhashkar's avatar
Mukund_Bhashkar
Brass Contributor
Mar 15, 2021

 

In this article, I would be talking about how can we write data from ADLS to Azure Synapse dedicated pool using AAD . We will be looking at direct sample code that can help us achieve that.

 

1. First step would be to import the libraries for Synapse connector. This is an optional statement.

 

 

2. Next step is to initialize variable to create/read data frames

   

Note : Above step can also be written in below format :

 

//val df = spark.read.csv("abfss://synapse@mukund.dfs.core.windows.net/100SalesRecords.csv")

 

3. Next step would be to use the write api in below format :

 

Execute the cell and you will be able to see the new table with data popped up:

Observation in Driver log with this exercise:

 

We find external data source, file format and external table created as well as dropped during this automated process.

 

More information about other options for dedicated pool and server less related read/write API's in SPARK can be found out on this page.

 

Updated Sep 15, 2021
Version 2.0

2 Comments

  • If you want to use Python (PySpark) to achieve the same, please use this sample code:

    df = spark.read.load('abfss://<container>@<storage-account>.dfs.core.windows.net/sample.snappy.parquet', format='parquet')
    %%sql
    CREATE DATABASE sampledb;
    df.write.mode("overwrite").saveAsTable("<schema-name>.<table-name>")
    %%spark
    val scala_df = spark.sqlContext.sql ("select * from <table-name>")
    scala_df.write.synapsesql("<database-name>.<schema-name>.<table-name>", Constants.INTERNAL)

     

  • Right now, same activity using SQL server Authentication may fail. This is known issue and being worked upon by us.