Skip to content

Tech Community Community Hubs

Register Sign In

Microsoft Community Hub
Piyush_Thakur

User Profile

Piyush_Thakur

Copper Contributor

Joined Aug 28, 2020

3 Posts

View All Badges

User Widgets

Recent Discussions

Most Recent Newest Topics Most Likes Solutions

Tagged:

Re: Unable to configure log directory of Spark History Server to Storage Blob when deployed on AKS
You mean I need to change or rename my values.yaml file to the py.yaml file?
Oct 12, 2020 Place Azure
3.7KViews
0likes
0Comments
Unable to configure log directory of Spark History Server to Storage Blob when deployed on AKS
I am trying to deploy Spark History Server on AKS and wanted to point it's log directory to Storage Blob. For achieving this, I am putting my configs on the values.yaml file as below:- wasbs: enableWASBS: true secret: azure-secrets sasKeyMode: false storageAccountKeyName: azure-storage-account-key storageAccountNameKeyName: azure-storage-account-name containerKeyName: azure-blob-container-name logDirectory: wasbs:///test/piyush/spark-history pvc: enablePVC: false nfs: enableExampleNFS: false First, I am creating the azure-secrets using the below command:- kubectl create secret generic azure-secrets --from-file=azure-storage-account-name --from-file=azure-blob-container-name --from-file=azure-storage-account-key After that, I am running the following set of commands:- helm repo add stable https://kubernetes-charts.storage.googleapis.com helm install stable/spark-history-server --values values.yaml --generate-name But while doing go, I am getting the following error:- 2020-10-05 19:17:56 INFO HistoryServer:2566 - Started daemon with process name: 12@spark-history-server-1601925447-57c5476fb-5wh6q 2020-10-05 19:17:56 INFO SignalUtils:54 - Registered signal handler for TERM 2020-10-05 19:17:56 INFO SignalUtils:54 - Registered signal handler for HUP 2020-10-05 19:17:56 INFO SignalUtils:54 - Registered signal handler for INT 2020-10-05 19:17:56 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-10-05 19:17:56 INFO SecurityManager:54 - Changing view acls to: root 2020-10-05 19:17:56 INFO SecurityManager:54 - Changing modify acls to: root 2020-10-05 19:17:56 INFO SecurityManager:54 - Changing view acls groups to: 2020-10-05 19:17:56 INFO SecurityManager:54 - Changing modify acls groups to: 2020-10-05 19:17:56 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 2020-10-05 19:17:56 INFO FsHistoryProvider:54 - History server ui acls disabled; users with admin permissions: ; groups with admin permissions Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:280) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:364) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:117) at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:86) ... 6 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more Any type of help or suggestion would be appreciated. Thanks in advance! Regards, Piyush Thakur
Oct 12, 2020 Place Azure
azure
kubernetes
3.8KViews
0likes
4Comments
Unable to write CSV file to Azure Blob Storage using Pyspark
Hi, I am trying to write CSV file to an Azure Blob Storage using Pyspark andI have installed Pyspark on my VM but I am getting this error. org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2482) at org.apache.hadoop.fs.azure.NativeAzureFileSystem$FolderRenamePending.execute(NativeAzureFileSystem.java:424) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.rename(NativeAzureFileSystem.java:1997) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:531) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:245) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:79) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:275) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:281) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:205) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.microsoft.azure.storage.StorageException: One of the request inputs is not valid. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89) at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182) at com.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(CloudBlob.java:764) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:399) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2449) ... 20 more My Pyspark code is as below: - from pyspark import SparkContext from pyspark.sql import SparkSession from pyspark.sql import Window from pyspark.sql.types import * from pyspark.sql.functions import * spark = SparkSession.builder.getOrCreate() storage_account_name = "######" storage_account_access_key = "####################################3" spark.conf.set("fs.azure.account.key." + storage_account_name + ".blob.core.windows.net",storage_account_access_key) spark._jsc.hadoopConfiguration().set("fs.wasbs.impl","org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem") spark._jsc.hadoopConfiguration().set("fs.azure.account.key.bdpccadlsdev2.blob.core.windows.net", storage_account_access_key) df = spark.read.format("csv").option("inferSchema", "true").option("header","true").load("wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/<path_to_csv>") df.show() df=df.withColumn("c1",lit("1")) df.show() df.coalesce(1).write.mode("overwrite").option("header", "true").format("csv").save("wasbs://<container_name>@<storage_account_name>.blob.core.windows.net/<path_to_write_csv>") I have used hadoop-azure-2.7.0.jar and azure-storage-2.2.0.jar JARS to read the CSV from my Blob. But I am not able to write back to the blob storage. Thanks, Piyush Thakur
Aug 28, 2020 Place Azure Storage
4.5KViews
0likes
2Comments

Recent Blog Articles

Newest Topics Most Likes

Tagged:

No content to show

Share this page

What's new

Surface Pro
Surface Laptop
Surface Laptop Studio 2
Copilot for organizations
Copilot for personal use
AI in Windows
Explore Microsoft products
Windows 11 apps

Microsoft Store

Account profile
Download Center
Microsoft Store support
Returns
Order tracking
Certified Refurbished
Microsoft Store Promise
Flexible Payments

Education

Microsoft in education
Devices for education
Microsoft Teams for Education
Microsoft 365 Education
How to buy for your school
Educator training and development
Deals for students and parents
AI for education

Business

Microsoft AI
Microsoft Security
Dynamics 365
Microsoft 365
Microsoft Power Platform
Microsoft Teams
Microsoft 365 Copilot
Small Business

Developer & IT

Azure
Microsoft Developer
Microsoft Learn
Support for AI marketplace apps
Microsoft Tech Community
Microsoft Marketplace
Marketplace Rewards
Visual Studio

Company

Careers
About Microsoft
Company news
Privacy at Microsoft
Investors
Diversity and inclusion
Accessibility
Sustainability

Your Privacy Choices

Sitemap
Contact Microsoft
Privacy
Manage cookies
Terms of use
Trademarks
Safety & eco
Recycling
About our ads
© Microsoft

Share on LinkedIn
Share on Facebook
Share on X
Share on Reddit
Share on Bluesky
Share on RSS
Share on Email