HDInsight
14 TopicsGlobally replicated data lakes with LiveData using WANdisco on Azure
The recent announcement of Azure Data Lake Storage Gen2 preview and its support on Azure HDInsight is already leading partners to innovate at a global scale. WANdisco enables globally replicated data lakes on Azure for analytics over the freshest data. This blog explains how. The modern business landscape is ruled by data. Analytics and AI are now essential for driving key business transformation. Customers have benefited tremendously from the performance, flexibility, and low cost offered by Azure for analytics and AI workloads. Read more about it in the Azure blog.786Views0likes0CommentsEnterprises get deeper insights with Hadoop and Spark updates on Azure HDInsight
Azure HDInsight is one of the most popular services amongst enterprise for open source Hadoop & Spark analytics on Azure. With the plus 50 percent price cut on HDInsight, customers moving to the cloud are reaping more savings than ever. We are announcing updates to Apache Spark, Apache Kafka, ML Services, Azure Data Lake Storage Gen2 and enhancements to Enterprise Security Package. These new capabilities will continue to drive savings for many of our customers. In addition to this, Microsoft is continuing to deepen its commitment to the Apache Hadoop ecosystem and has extended its partnership with Hortonworks to bring the best of Apache Hadoop and the open source big data analytics to the Cloud. Read more about it in the Azure blog.1KViews0likes0CommentsMicrosoft deepens its commitment to Apache Hadoop and open source analytics
Earlier today, the Microsoft Corporation deepened its commitment to the Apache Hadoop ecosystem and its partnership with Hortonworks that has brought the best of Apache Hadoop and the open source big data analytics to the Cloud. Since the start of the partnership nearly six years ago, hundreds of the largest enterprises have chosen to use Azure HDInsight and Hortonworks to run Hadoop, Spark and other Open Source analytics workloads on Azure. Also, during this time, Microsoft has become one of the leading committers to Apache projects, sharing its experience running one of largest data lakes on the planet, with the open source community. Read about it in the Azure blog.908Views0likes0CommentsGain application insights for Big Data solutions using Unravel data on Azure HDInsight
Unravel on HDInsight enables developers and IT Admins to manage performance, auto scaling & cost optimization better than ever. We are pleased to announce Unravel on Azure HDInsight Application Platform. Azure HDInsight is a fully-managed open-source big data analytics service for enterprises. You can use popular open-source frameworks (Hadoop, Spark, LLAP, Kafka, HBase, etc.) to cover broad range of scenarios such as ETL, Data Warehousing, Machine Learning, IoT and more. Unravel provides comprehensive application performance management (APM) for these scenarios and more. The application helps customers analyze, optimize, and troubleshoot application performance issues and meet SLAs in a seamless, easy to use, and frictionless manner. Some customers report up to 200 percent more jobs at 50 percent lower cost using Unravel’s tuning capability on HDInsight. To learn more please join Pranav Rastogi, Program Manager at Microsoft Azure Big Data, and Shivnath Babu, CTO at Unravel, in a webinar on June 13 for how to build fast and reliable big data apps on Azure while keeping cloud expenses within your budget. Read more about it in the Azure blog.1.6KViews0likes0CommentsHDInsight tools for VS Code now supports argparse and Spark 2.2
We are happy to announce that HDInsight Tools for VSCode now supports argparse and accepts parameter based Pyspark Job submission. We also enabled the tools to support Spark 2.2 for PySpark author and job submission. The argparse feature grants you great flexibility for your PySpark code author, test and job submission for both batch and interactive query. You can fully enjoy the advantage of PySpark argparse, and simply keep your configuration and your job-related arguments in the Json based configuration file. Read more about it in the Azure blog.856Views0likes0CommentsAzure Toolkit for IntelliJ and for Eclipse integrates with HDInsight Ambari and supports Spark 2.2
To provide more authentication options, Azure Toolkit for IntelliJ and Azure Toolkit for Eclipse now support integration with HDInsight clusters through Ambari for job submission, cluster resource browse and storage files navigate. You can easily link or unlink any cluster by using an Ambari-managed username and password, which is independent of your Azure sign-in credentials. The Ambari connection applies to normal Spark and Hive hosted within HDInsight on Azure. These additions give you more flexibility in how you connect to your HDInsight clusters in addition to your Azure subscriptions while also simplifying your experiences in submitting Spark jobs. With this release, you can benefit the new functionalities and consume the new libraries & APIs from Spark 2.2 in Azure Toolkit for IntelliJ and Azure Toolkit for Eclipse. You can create, author and submit a Spark 2.2 project to Spark 2.2 cluster. With the backward compatibility of Spark 2.2, you can also submit your existing Spark 2.0 and Spark 2.1 projects to a Spark 2.2 cluster. Read about it in the Azure blogs here and here.927Views0likes0CommentsTimeseries resampling with Data Factory
Dear Community, I am currently working on a data analytics project and have the challenge of handling time series data within azure. We have several thousand sensors, which report their values to a control unit, which writes this data in form of json files into an Azue Data Lake. We're also considering the option of writing this data directly to a Database. The sensors deliver their data in different intervals. But within our data platform, we wish to have one uniform sampling rate over all sensors. Therefore we're required to upsample (interpolate) the signal if the frequency is too low or downsample (average) the signal if the frequency is too high. We want to keep this as simple as possible and thought about starting a Data Factory Job, which performs this job every 5 minutes on the newly data which came into the source. The actual implementation can be in C#, python, R or even javascript. From what I've learnt about Data Factory so far, there are different ways to do that: 1) Use HD Insights I see this as pretty much work to setup and get familiar with it. Therefore looking for an easier option 2) Use a U-SQL Query with C#/Python or R As far as I understand, this is only possible via Data Lake analytics, thus only a Data Lake is a possible input source, correct? 3) Create custom activity This is only in Data Factory v2 available, which is currently in preview, not available in our favorite location and the integration into Visual Studio 2017 is almost 0. Moreover it is a pretty complex setup alltogehter with the batch processing. My question now is: Is there any other possible setup you would suggest I did not see? And if not, which of the solutions you would suggest? Thank you in advance for any input on this.2.3KViews0likes0CommentsHDInsight tools for IntelliJ & Eclipse April updates
We are pleased to announce the April updates of HDInsight Tools for IntelliJ & Eclipse. This is a quality milestone and we focus primarily on refactoring the components and fixing bugs. We also added Azure Data Lake Store support and Eclipse local emulator support in this release. The HDInsight Tools for IntelliJ & Eclipse serve the open source community and are of interest to HDInsight Spark developers. The tools run smoothly in Linux, Mac, and Windows. The major improvements are code refactoring and telemetry enhancements. More than forty bugs around job author, submission, and job view are fixed to improve the quality of the tools in this release. Read about it on the Azure blog.761Views0likes0CommentsIntroducing H2O.ai on Azure HDInsight
We are excited to announce that H2O's AI platform is now available on Azure HDInsight Application Platform. Users can now use H2O.ai’s open source solutions on Azure HDInsight, which allows reliable open source analytics with an industry-leading SLA. Azure HDInsight is the only fully-managed cloud Hadoop offering that provides optimized open source analytical clusters for Spark, Hive, MapReduce, HBase, Storm, Kafka, and R Server backed by a 99.9% SLA. Each of these big data technologies and ISV applications, such as H2O, are easily deployable as managed clusters with enterprise-level security and monitoring. Read about it on the Azure blog.971Views0likes0CommentsIntroducing Dataiku’s DSS on Microsoft Azure HDInsight to make data science easier
We are pleased to announce the expansion of HDInsight Application Platform to include Dataiku. Azure HDInsight is the industry leading fully-managed cloud Apache Hadoop & Spark offering which allows customers to do reliable open source analytics with an industry-leading SLA. Dataiku develops Data Science Studio (DSS), a collaborative data science platform that enables companies to build and deliver their analytical solutions more efficiently. This combined offering of DSS on HDInsight enables customers to easily use data science to build big data solutions and run them at enterprise grade and scale. Read about it on the Azure blog.1.2KViews0likes0Comments