Big Data & Analytics
10 TopicsHDInsight tools for IntelliJ & Eclipse December Updates
Microsoft is pleased to announce the December updates of HDInsight Tools for IntelliJ & Eclipse. The HDInsight Tools for IntelliJ & Eclipse serve the open source community and will be of interest to HDInsight Spark developers. The tools run smoothly in Linux, Mac and Windows. The recent release focuses on users’ feedback to ensure a smooth user experiences on project creation and submission. The release also covers a couple of new features including Spark 2.0 support, local run, and a refined Job View & Job Graph. Updates include: Support Spark 2.0 Local Run - Use the HDInsight Tools for IntelliJ with the Hortonworks Sandbox Job View & Job Graph Installation Read more on Azure Blogs.936Views0likes1CommentMachine learning tutorial: Create your first data science experiment in Azure Machine Learning
If you've never used Azure Machine Learning Studio before, this tutorial is for you. In this tutorial, learn how to use Studio for the first time to create a machine learning experiment. The experiment will test an analytical model that predicts the price of an automobile based on different variables such as make and technical specifications. Get started.1.6KViews0likes0CommentsQuery Big tabular data
Hi, I have a tabular data of over 6000 columns and millions of rows (over 500 MB, mostly numbers). I need to read and write on the table continuously from 2 different sources (one is writing, other reading). Also I want to be able to filter and query data, based on rows and/or columns and it should be very fast and cheap. I need to make a query every 2min all day, every day and I am using R language to connect. I want to know what is the best option in Azure to deal with this type of data? What technology should I use?623Views0likes0CommentsWatch: Building Big Data Applications Using Azure HDInsight Service
Learn how to use Azure HDInsight service to build solutions that can handle any shape data at massive scale. We will build an end to end application that uses both data in motion (Streaming) as well as data at rest (Batch). In this session we will use Big Data technologies like Hadoop, HBase, Storm and Spark to build an IoT application (Internet of Things), starting from data ingestion, all the way to visual dashboards. As part of the application development we will show you the Developer experience in Visual Studio for writing Hive queries and building Storm topologies. The session will introduce you to various Big Data service offerings in Azure and how to apply them in your scenarios.732Views0likes0CommentsAnnouncing GA of Azure Data Lake!
Today Microsoft announced the general availability of Azure Data Lake, ushering in a new era of productivity for your big data developers and scientists. Fundamentally different from today’s cluster-based solutions, the Azure Data Lake services enable you to securely store all your data centrally in a “no limits” data lake, and run on-demand analytics that instantly scales to your needs. Our state-of-the-art development environment and rich and extensible U-SQL language enable you to write, debug, and optimize massively parallel analytics programs in a fraction of the time of existing solutions. Read more on the Azure Blog.1.5KViews0likes0CommentsGet started with Azure Data Lake Analytics using Azure portal
Learn how to develop a job that reads a tab separated values (TSV) file and converts it into a comma separated values (CSV) file. To go through the same tutorial using other supported tools, click the tabs on the top of this section. Once your first job succeeds, you can start to write more complex data transformations with U-SQL. Get started creating your Data Lake Analytics account now.919Views0likes0CommentsOverview of Microsoft Azure Data Lake Analytics
Azure Data Lake Analytics is a new service, built to make big data analytics easy. This service lets you focus on writing, running and managing jobs, rather than operating distributed infrastructure. Instead of deploying, configuring and tuning hardware, you write queries to transform your data and extract valuable insights. The analytics service can handle jobs of any scale instantly by simply setting the dial for how much power you need. You only pay for your job when it is running making it cost-effective. The analytics service supports Azure Active Directory letting you simply manage access and roles, integrated with your on-premises identity system. It also includes U-SQL, a language that unifies the benefits of SQL with the expressive power of user code. U-SQL’s scalable distributed runtime enables you to efficiently analyze data in the store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse. Learn about the key capabilities here.1.7KViews0likes0CommentsBig Data @ Microsoft
Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and LOB operations; now, exploratory and predictive analysis are ubiquitous, and the default is to capture and store any and all data in anticipation of potential future value. Differences in data heterogeneity, scale and usage are leading to a new generation of data management and analytic systems where the emphasis is on supporting a wide range of large datasets that are stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools, e.g., for ML and stream analytics. These new systems are necessarily based on scale-out architectures for both storage and computation. Hadoop has become a key building block in the new generation of scale-out systems. On the storage side, HDFS has provided a cost-effective and scalable substrate for storing large heterogeneous datasets. However, as key customer and systems touch points are instrumented to log data, and IoT apps become common, enterprise data is growing at a staggering pace, and the need to leverage different storage tiers (from tape to memory) poses new challenges, leading to caching technologies such as Spark. On the analytics side, resource managers like YARN have opened the door for analytics tools to bypass Map-Reduce and directly exploit shared system resources while computing close to data copies. This trend is significant in iterative computations such as graph analytics and ML for which Map-Reduce is seen as a poor fit. While Hadoop is widely recognized and used externally, Microsoft has long been at the forefront of Big Data analytics, with Cosmos and Scope supporting all internal customers. These internal services are a key part of our strategy going forward, now enabling new state of the art external services like Azure Data Lake. I will examine these trends and ground the talk by discussing the Microsoft Big Data stack.1.5KViews0likes0CommentsDownload the Azure Machine Learning Algorithm Cheat Sheet
The Microsoft Azure Machine Learning Algorithm Cheat Sheet helps you choose the right algorithm for a predictive analytics model. Azure Machine Learning Studio has a large library of algorithms from the regression, classification, clustering, and anomaly detection families. Each is designed to address a different type of machine learning problem. Download the cheat sheet: Machine Learning Algorithm Cheat Sheet (11x17 in.)8.7KViews2likes0CommentsWhat is Stream Analytics?
Azure Stream Analytics is a fully managed, cost effective real-time event processing engine that helps to unlock deep insights from data. Stream Analytics makes it easy to set up real-time analytic computations on data streaming from devices, sensors, web sites, social media, applications, infrastructure systems, and more. Learn more about what Stream Analyitcs can do here.1.1KViews0likes0Comments