Forum Discussion
Ashwini_Akula
Aug 05, 2020Copper Contributor
Unable to write csv to azure blob storage using Pyspark
Hi there, I am trying to write a csv to an azure blob storage using pyspark but receiving error as follows: Caused by: com.microsoft.azure.storage.StorageException: One of the request inputs is ...
Ashwini_Akula
Aug 07, 2020Copper Contributor
mfessalifi
Aug 09, 2020Copper Contributor
Hi Ashwini_Akula,
To eliminate Scala/Spark to Storage connection issues, can you test a simple connection?
- scala> val df = spark.read.format("csv").option("inferSchema", "true").load("wasbs://CONTAINER_NAME@ACCOUNT_NAME.blob.core.windows.net/<Folder>/..")*
- scala> df.show()
Regards,
Faiçal (MCT, Azure Expert & Team Leader)
- rajenglaMar 01, 2021Copper Contributor
mfessalifi I am facing the same issue as well. We are able to read from the Azure Blob storage. But facing the issue while writing the data using PySpsark.
sc = pyspark.SparkContext.getOrCreate() spark.sparkContext.setLogLevel('ERROR') storage_account_access_key = "******" blob_account_name = "blob_account_name" blob_container_name = "blob_container_name" spark.conf.set("spark.hadoop.fs.wasbs.impl", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark.conf.set("fs.wasbs.impl", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark.conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark.conf.set("fs.azure.account.key.%s.blob.core.windows.net" % (blob_account_name), storage_account_access_key) sc._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") sc._jsc.hadoopConfiguration().set("fs.azure.account.key.%s.blob.core.windows.net" % (blob_account_name), storage_account_access_key) sc._jsc.hadoopConfiguration().set("fs.wasbs.impl", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") sc._jsc.hadoopConfiguration().set("spark.hadoop.fs.wasbs.impl", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") sqlContext = SQLContext(sc) rel_input_path = "input_path" input_path = "wasbs://%s@%s.blob.core.windows.net/%s" % (blob_container_name, blob_account_name, rel_input_path) df = sqlContext.read.format("com.databricks.spark.csv").option("delimiter","~").option("header", "false").load(input_path) rel_output_path = "output_path" output_path = "wasbs://%s@%s.blob.core.windows.net/%s" % (blob_container_name, blob_account_name, rel_output_path) df.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").mode("overwrite").save(output_path)
Please find attached a stack trace of the error.
Caused by: org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2482) at org.apache.hadoop.fs.azure.NativeAzureFileSystem$FolderRenamePending.execute(NativeAzureFileSystem.java:424) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.rename(NativeAzureFileSystem.java:1997) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:531) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:245) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:79) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:275) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:281) ... 9 more Caused by: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:162) at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:177) at com.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(CloudBlob.java:764) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:399) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2449) ... 20 more
What might be the reason behind this error?
- Amrinder_SinghMar 17, 2021
Microsoft
One thing to check is whether you are using a blob storage account or a ADLS Gen 2 (HNS) account. If you are making use of ADLS Gen2 kind try connecting with ABFS driver instead of WASBS driver.