How to set Spark / Pyspark custom configs in Synapse Workspace spark pool

Former Employee

Feb 05, 2021

In Azure Synapse, system configurations of spark pool look like below, where the number of executors, vcores, memory is defined by default.

There could be the requirement of few users who want to manipulate the number of executors or memory assigned to a spark session during execution time.

Usually, we can reconfigure them by traversing to the Spark pool on Azure Portal and set the configurations in the spark pool by uploading text file which looks like this:

But in the Synapse spark pool, few of these user-defined configurations get overridden by the default value of the Spark pool.

What should be the next step to persist these configurations at the spark pool Session level?

For notebooks

If we want to set config of a session with more than the executors defined at the system level (in this case there are 2 executors as we saw above), we need to write below sample code to populate the session with 4 executors. This sample code helps to logically get more executors for a session.

Execute the below code to confirm that the number of executors is the same as defined in the session which is 4 :

In the sparkUI you can also see these executors if you want to cross verify :

A list of many session configs is briefed here .

We can also setup the desired session-level configuration in Apache Spark Job definition :

For Apache Spark Job:

If we want to add those configurations to our job, we have to set them when we initialize the Spark session or Spark context, for example for a PySpark job:

Spark Session:

from pyspark.sql import SparkSession

if __name__ == "__main__":

# create Spark session with necessary configuration

spark = SparkSession \

.builder \

.appName("testApp") \

.config("spark.executor.instances","4") \

.config("spark.executor.cores","4") \

.getOrCreate()

Spark Context:

from pyspark import SparkContext, SparkConf

if __name__ == "__main__":

# create Spark context with necessary configuration

conf = SparkConf().setAppName("testApp").set("spark.hadoop.validateOutputSpecs", "false").set("spark.executor.cores","4").set("spark.executor.instances","4")

spark = SparkContext(conf=conf)

Hope this helps you to configure a job/notebook as per your convenience with the number of executors.

Updated Sep 15, 2021

Version 2.0

synapse devops

synapse spark

Mukund_Bhashkar

Former Employee

Joined June 28, 2019

View Profile

Azure Synapse Analytics Blog

Follow this blog board to get notified when there's new activity