Jan 15 2019 10:18 AM
I am trying to read a Parquet file from Azure Data Lake using the following Pyspark code.
df= sqlContext.read.format("parquet") .option("header", "true") .option("inferSchema", "true") .load("adl://xyz/abc.parquet")
df = df['Id','IsDeleted']
Now I would like to load this dataframe df as a table in sql dataware house using the following code
df.write \ .format("com.databricks.spark.sqldw") \ .mode('overwrite') \ .option("url", sqlDwUrlSmall) \ .option("forward_spark_azure_storage_credentials", "true") \ .option("dbtable", "test111") \ .option("tempdir", tempDir) \ .save()
This creates a table dbo.test111 in the SQL Datawarehouse with datatypes:
But I need these columns with different datatypes say char(255), varchar(128) in SQL Datawarehouse. How do I do this while loading the dataframe into SQL Dataware house?