Azure Data Lake
2 TopicsUsing VisualStudio to connect to HDP 2.6
I'm trying to connect Visual Studio HDInsights Emulator to an Hortonworks 2.6 HDP Cluster. (3 master servers, and 3 data nodes) but every time i try to connect, i have issues wiht HiveServer2 and WebHDFS. any help would be appreciated... i'm trying to find a good GUI for writting HIVE quries outside of the hive view since it was discontinued in future distros. here's the writeup i'm using: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-emulator-visual-studio i'm using visual studio 2017, hdp 2.6, and i've got my endpoints setup as: WebHCat: http://hdpmn03.mydomain.com/:50111 (this is the node my hive / webhcat resides on) HiveServer2: http://hdpmn03.mydomain.com/:10000 (this is the node my hive / webhcat resides on, also did a portscan and some googling and found that usually it's port 10000, not the default v.s. trys to set it to 10001) WebHDFS: http://hdpmn01.mydomain.com/:50070 (this is my namenode, i post files via curl all the time so i know this is correct) SSH: http://hdpmn01.mydomain.com/:22 (centos7 box, i assume this is right, i get creen when i hit next) Yarn Timeline: http://hdpmn01.mydomain.com/:8188 (yarn timeline is on this box) User: root Pw: ###### when i hit next, WebHcat service is connected successfully. Failed to connect to HiveServer2 - Error Detail :Failed to open connection. Please check your connection string. See inner exception for failure details. but i have no way of seeing what the error is (unless you guys have any ideas). i have a pause sign WebHDFS. could be connected but needs config change (if i hit update, it changes to error 403 forbidden) SSH Service is connected scucessfully. any help in getting V.S. connected so i can play with some quries and what-not would be super helpful! thanks! P.S. i was directed to the Azure forum by some peoples @ Hortonworks. see my post here: https://community.hortonworks.com/questions/227282/using-visualstudio-to-connect-to-hdp-26.html1.1KViews0likes0CommentsCan I change the datatype of the Spark dataframe columns that is being loaded to SQL DataWare House?
I am trying to read a Parquet file from Azure Data Lake using the following Pyspark code. df= sqlContext.read.format("parquet") .option("header", "true") .option("inferSchema", "true") .load("adl://xyz/abc.parquet") df = df['Id','IsDeleted'] Now I would like to load this dataframe df as a table in sql dataware house using the following code df.write \ .format("com.databricks.spark.sqldw") \ .mode('overwrite') \ .option("url", sqlDwUrlSmall) \ .option("forward_spark_azure_storage_credentials", "true") \ .option("dbtable", "test111") \ .option("tempdir", tempDir) \ .save() This creates a table dbo.test111 in the SQL Datawarehouse with datatypes: Id(nvarchar(256),null) IsDeleted(bit,null) But I need these columns with different datatypes say char(255), varchar(128) in SQL Datawarehouse. How do I do this while loading the dataframe into SQL Dataware house?1KViews0likes0Comments