HDInsight
14 TopicsTimeseries resampling with Data Factory
Dear Community, I am currently working on a data analytics project and have the challenge of handling time series data within azure. We have several thousand sensors, which report their values to a control unit, which writes this data in form of json files into an Azue Data Lake. We're also considering the option of writing this data directly to a Database. The sensors deliver their data in different intervals. But within our data platform, we wish to have one uniform sampling rate over all sensors. Therefore we're required to upsample (interpolate) the signal if the frequency is too low or downsample (average) the signal if the frequency is too high. We want to keep this as simple as possible and thought about starting a Data Factory Job, which performs this job every 5 minutes on the newly data which came into the source. The actual implementation can be in C#, python, R or even javascript. From what I've learnt about Data Factory so far, there are different ways to do that: 1) Use HD Insights I see this as pretty much work to setup and get familiar with it. Therefore looking for an easier option 2) Use a U-SQL Query with C#/Python or R As far as I understand, this is only possible via Data Lake analytics, thus only a Data Lake is a possible input source, correct? 3) Create custom activity This is only in Data Factory v2 available, which is currently in preview, not available in our favorite location and the integration into Visual Studio 2017 is almost 0. Moreover it is a pretty complex setup alltogehter with the batch processing. My question now is: Is there any other possible setup you would suggest I did not see? And if not, which of the solutions you would suggest? Thank you in advance for any input on this.2.3KViews0likes0CommentsGain application insights for Big Data solutions using Unravel data on Azure HDInsight
Unravel on HDInsight enables developers and IT Admins to manage performance, auto scaling & cost optimization better than ever. We are pleased to announce Unravel on Azure HDInsight Application Platform. Azure HDInsight is a fully-managed open-source big data analytics service for enterprises. You can use popular open-source frameworks (Hadoop, Spark, LLAP, Kafka, HBase, etc.) to cover broad range of scenarios such as ETL, Data Warehousing, Machine Learning, IoT and more. Unravel provides comprehensive application performance management (APM) for these scenarios and more. The application helps customers analyze, optimize, and troubleshoot application performance issues and meet SLAs in a seamless, easy to use, and frictionless manner. Some customers report up to 200 percent more jobs at 50 percent lower cost using Unravel’s tuning capability on HDInsight. To learn more please join Pranav Rastogi, Program Manager at Microsoft Azure Big Data, and Shivnath Babu, CTO at Unravel, in a webinar on June 13 for how to build fast and reliable big data apps on Azure while keeping cloud expenses within your budget. Read more about it in the Azure blog.1.6KViews0likes0CommentsIntroducing Dataiku’s DSS on Microsoft Azure HDInsight to make data science easier
We are pleased to announce the expansion of HDInsight Application Platform to include Dataiku. Azure HDInsight is the industry leading fully-managed cloud Apache Hadoop & Spark offering which allows customers to do reliable open source analytics with an industry-leading SLA. Dataiku develops Data Science Studio (DSS), a collaborative data science platform that enables companies to build and deliver their analytical solutions more efficiently. This combined offering of DSS on HDInsight enables customers to easily use data science to build big data solutions and run them at enterprise grade and scale. Read about it on the Azure blog.1.2KViews0likes0CommentsAnalyze your data with Application Insights Analytics
Analytics is a powerful search tool that lets you analyze large volumes of any JSON or CSV data. You can run a wide range of queries, including statistical and machine learning algorithms. You might be familiar with it as part of Application Insights, but you can also apply it to any stream of NoSQL data. For example, let’s suppose you receive a data feed about flights. You could automate a daily analysis of route popularity and congestion. Analytics can run complex queries, including joins, aggregations, and statistical functions, to extract the necessary results. You can view the results in the range of charts available in Analytics. Or you could have Power BI run the queries each day, plot the results on maps, and present them on a website. Read more on the Azure blog.1.1KViews0likes0CommentsEnterprises get deeper insights with Hadoop and Spark updates on Azure HDInsight
Azure HDInsight is one of the most popular services amongst enterprise for open source Hadoop & Spark analytics on Azure. With the plus 50 percent price cut on HDInsight, customers moving to the cloud are reaping more savings than ever. We are announcing updates to Apache Spark, Apache Kafka, ML Services, Azure Data Lake Storage Gen2 and enhancements to Enterprise Security Package. These new capabilities will continue to drive savings for many of our customers. In addition to this, Microsoft is continuing to deepen its commitment to the Apache Hadoop ecosystem and has extended its partnership with Hortonworks to bring the best of Apache Hadoop and the open source big data analytics to the Cloud. Read more about it in the Azure blog.1KViews0likes0CommentsIntroducing H2O.ai on Azure HDInsight
We are excited to announce that H2O's AI platform is now available on Azure HDInsight Application Platform. Users can now use H2O.ai’s open source solutions on Azure HDInsight, which allows reliable open source analytics with an industry-leading SLA. Azure HDInsight is the only fully-managed cloud Hadoop offering that provides optimized open source analytical clusters for Spark, Hive, MapReduce, HBase, Storm, Kafka, and R Server backed by a 99.9% SLA. Each of these big data technologies and ISV applications, such as H2O, are easily deployable as managed clusters with enterprise-level security and monitoring. Read about it on the Azure blog.970Views0likes0CommentsAzure Toolkit for IntelliJ and for Eclipse integrates with HDInsight Ambari and supports Spark 2.2
To provide more authentication options, Azure Toolkit for IntelliJ and Azure Toolkit for Eclipse now support integration with HDInsight clusters through Ambari for job submission, cluster resource browse and storage files navigate. You can easily link or unlink any cluster by using an Ambari-managed username and password, which is independent of your Azure sign-in credentials. The Ambari connection applies to normal Spark and Hive hosted within HDInsight on Azure. These additions give you more flexibility in how you connect to your HDInsight clusters in addition to your Azure subscriptions while also simplifying your experiences in submitting Spark jobs. With this release, you can benefit the new functionalities and consume the new libraries & APIs from Spark 2.2 in Azure Toolkit for IntelliJ and Azure Toolkit for Eclipse. You can create, author and submit a Spark 2.2 project to Spark 2.2 cluster. With the backward compatibility of Spark 2.2, you can also submit your existing Spark 2.0 and Spark 2.1 projects to a Spark 2.2 cluster. Read about it in the Azure blogs here and here.927Views0likes0CommentsMicrosoft deepens its commitment to Apache Hadoop and open source analytics
Earlier today, the Microsoft Corporation deepened its commitment to the Apache Hadoop ecosystem and its partnership with Hortonworks that has brought the best of Apache Hadoop and the open source big data analytics to the Cloud. Since the start of the partnership nearly six years ago, hundreds of the largest enterprises have chosen to use Azure HDInsight and Hortonworks to run Hadoop, Spark and other Open Source analytics workloads on Azure. Also, during this time, Microsoft has become one of the leading committers to Apache projects, sharing its experience running one of largest data lakes on the planet, with the open source community. Read about it in the Azure blog.906Views0likes0CommentsUse BigDL on HDInsight Spark for Distributed Deep Learning
Deep learning is impacting everything from healthcare, transportation, manufacturing, and more. Companies are turning to deep learning to solve hard problems like image classification, speech recognition, object recognition, and machine translation. In this blog post, Intel’s BigDL team and Azure HDInsight team collaborate to provide the basic steps to use BigDL on Azure HDInsight. In 2016, Intel released its BigDL distributed Deep Learning project into the open-source community, BigDL Github. It natively integrates into Spark, supports popular neural net topologies, and achieves feature parity with other open-source deep learning frameworks. BigDL also provides 100+ basic neural networks building blocks allowing users to create novel topologies to suit their unique applications. Thus, with Intel’s BigDL, the users are able to leverage their existing Spark infrastructure to enable Deep Learning applications without having to invest into bringing up separate frameworks to take advantage of neural networks capabilities. Read about it on the Azure blog.884Views0likes0CommentsAnnouncing general availability of Azure HDInsight 3.6
This week at DataWorks Summit, we are pleased to announce general availability of Azure HDInsight 3.6 backed by our enterprise grade SLA. HDInsight 3.6 brings updates to various open source components in Apache Hadoop & Spark eco-system to the cloud, allowing customers to deploy them easily and run them reliably on an enterprise grade platform. Azure HDInsight 3.6 is a major update to the core Apache Hadoop & Spark platform as well as with various open source components. HDInsight 3.6 has the latest Hortonworks Data Platform (HDP) 2.6 platform, a collaborative effort between Microsoft and Hortonworks to bring HDP to market cloud-first. Read about it on the Azure blog.879Views0likes0Comments