First published on MSDN on Aug 30, 2016
I have been working with Polybase in Parallel Data Warehouse for a few years now, so I was quite excited to hear about this great feature making it into SQL 2016. It is a great was to use tools and a language you already know (TSQL) to be able to access data stored in Hadoop. In this article, I am going to walk you through how I set up my own environment in Azure to test Polybase, and learn a little more about Hortonworks Hadoop.
You read correctly…..I am using Hortonworks Hadoop. Polybase is a great feature to use to bridge between SQL Server databases and Azure Blob Storage or HDInsight. Many customers already have huge investments in on prem Hadoop utilizing other Hadoop installations like Cloudera or Hortonworks. In order to provide the best guidance to my customers, I need to understand their whole environment. Therefore, I put together my own environment using my Azure benefits with my MSDN Subscription. Luckily, Azure has a VM template for a Hortonworks sandbox.
Here are some tips and resources on how to create the environment to use Polybase to connect to the Hortonworks sandbox. Once this is complete, you can use the test data in the sandbox to practice using Polybase. In my next post, I will show you how you can mimic some of the Hortonworks Sandbox labs with Polybase to get an idea of how it works.
This link from Hortonworks walks you right through the process: http://hortonworks.com/hadoop-tutorial/deploying-hortonworks-sandbox-on-microsoft-azure/
The sandbox has a maria_dev user for you to use along with the demos. Also, notice the address that I used is a 10.0.0.X address. To keep things simple and local to the Azure Resource Group, I am using the local IP addresses for the VMs.
NOTE: If you want to log into the Ambari portal as an admin, you will need to do the following:
Your command window will look something like this when it is complete:
You can then log in to the Ambari portal as the admin user. To connect to Ambari you just browse using the IP of the sandbox and specify port 8080
TIP: I downloaded Putty on my SQL 2016 VM and use that and IE within my RDP session on that VM to do anything I need on the Hadoop sandbox. That way I can use the internal IPs, which will be faster.
NOTE: If you want to have everything on premises, you can build your own SQL 2016 VM and download a Hortonworks Sandbox VM, but I have personally switched to using Azure for all my lab environment needs as it is so easy to create/destroy/recreate lab environments at will.
In my next post … I will show you how you can use the sandbox and SQL 2016 to test Polybase queries against Hadoop data sources.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.