User Profile
Piyush_Thakur
Copper Contributor
Joined Aug 28, 2020
User Widgets
Recent Discussions
Unable to configure log directory of Spark History Server to Storage Blob when deployed on AKS
I am trying to deploy Spark History Server on AKS and wanted to point it's log directory to Storage Blob. For achieving this, I am putting my configs on the values.yaml file as below:- wasbs: enableWASBS: true secret: azure-secrets sasKeyMode: false storageAccountKeyName: azure-storage-account-key storageAccountNameKeyName: azure-storage-account-name containerKeyName: azure-blob-container-name logDirectory: wasbs:///test/piyush/spark-history pvc: enablePVC: false nfs: enableExampleNFS: false First, I am creating the azure-secrets using the below command:- kubectl create secret generic azure-secrets --from-file=azure-storage-account-name --from-file=azure-blob-container-name --from-file=azure-storage-account-key After that, I am running the following set of commands:- helm repo add stable https://kubernetes-charts.storage.googleapis.com helm install stable/spark-history-server --values values.yaml --generate-name But while doing go, I am getting the following error:- 2020-10-05 19:17:56 INFO HistoryServer:2566 - Started daemon with process name: 12@spark-history-server-1601925447-57c5476fb-5wh6q 2020-10-05 19:17:56 INFO SignalUtils:54 - Registered signal handler for TERM 2020-10-05 19:17:56 INFO SignalUtils:54 - Registered signal handler for HUP 2020-10-05 19:17:56 INFO SignalUtils:54 - Registered signal handler for INT 2020-10-05 19:17:56 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-10-05 19:17:56 INFO SecurityManager:54 - Changing view acls to: root 2020-10-05 19:17:56 INFO SecurityManager:54 - Changing modify acls to: root 2020-10-05 19:17:56 INFO SecurityManager:54 - Changing view acls groups to: 2020-10-05 19:17:56 INFO SecurityManager:54 - Changing modify acls groups to: 2020-10-05 19:17:56 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 2020-10-05 19:17:56 INFO FsHistoryProvider:54 - History server ui acls disabled; users with admin permissions: ; groups with admin permissions Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:280) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:364) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:117) at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:86) ... 6 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more Any type of help or suggestion would be appreciated. Thanks in advance! Regards, Piyush Thakur3.8KViews0likes4CommentsUnable to write CSV file to Azure Blob Storage using Pyspark
Hi, I am trying to write CSV file to an Azure Blob Storage using Pyspark andI have installed Pyspark on my VM but I am getting this error. org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2482) at org.apache.hadoop.fs.azure.NativeAzureFileSystem$FolderRenamePending.execute(NativeAzureFileSystem.java:424) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.rename(NativeAzureFileSystem.java:1997) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:531) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:245) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:79) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:275) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:281) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:205) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89) at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182) at com.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(CloudBlob.java:764) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:399) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2449) ... 20 more My Pyspark code is as below: - from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql import Window from pyspark.sql.types import * from pyspark.sql.functions import * spark = SparkSession.builder.getOrCreate() storage_account_name = "######" storage_account_access_key = "####################################3" spark.conf.set("fs.azure.account.key." + storage_account_name + ".blob.core.windows.net",storage_account_access_key) spark._jsc.hadoopConfiguration().set("fs.wasbs.impl","org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark._jsc.hadoopConfiguration().set("fs.azure.account.key.bdpccadlsdev2.blob.core.windows.net", storage_account_access_key) df = spark.read.format("csv").option("inferSchema", "true").option("header","true").load("wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/<path_to_csv>") df.show() df=df.withColumn("c1",lit("1")) df.show() df.coalesce(1).write.mode("overwrite").option("header", "true").format("csv").save("wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/<path_to_write_csv>") I have used hadoop-azure-2.7.0.jar and azure-storage-2.2.0.jar JARS to read the CSV from my Blob. But I am not able to write back to the blob storage. Thanks, Piyush Thakur4.5KViews0likes2Comments
Recent Blog Articles
No content to show