HDInsight on AKS
14 TopicsRun federated queries on Trino with HDInsight on AKS
HDInsight on AKS is a modern, reliable, secure, and fully managed Platform as a Service (PaaS) that runs on Azure Kubernetes Service (AKS). HDInsight on AKS allows you to deploy popular Open-Source Analytics workloads like Apache Spark, Apache Flink,and Trino without the overhead of managing and monitoring containers. This blog explains how you can use Trino with HDInsight on AKS to experience the flavor of running federated queries. Trino is an open-source distributed SQLquery engine and well known for its capability to query data from multiple sources without moving your data. That is cool!! Isn't it? And of course! you can use Trino for other use cases as well. Interactive and ad-hoc querying. Batch ETL across heterogenous systems. Reporting and Dashboarding. With Trino on HDinsight on AKS, you can set up your cluster easily and fast with a few clicks. You can also enjoy the advantages of auto scaling, an intuitive UI to configure your cluster, and monitoring tools like Azure monitor and managed Prometheus and Grafana. Moreover, there are some exclusive features, such as a special connector for Power BI, result caching for all connectors, query scan statistics to analyze query performance, and Shared SQL connector. There is lot more coming on your way with intuitive UI for catalog management, fine-grained access control with Apache Ranger, and an upgraded OSS version. Stay tuned!! Let's see in action how you can run federated queries on Trino with HDInsight on AKS. Scenario: Analyze NYC taxi data to understand the average fare and the passenger count at different zones. Summary of steps for the demo: Take NYC taxi data from the official source. Land the data in ADLS Gen 2 and expose it as a Hive table in Trino. Prepare zone data and land in Azure Database for PostgresSQL. Run a federated query on two data sources ADLS Gen2 and Azure Database for PostgresSQL. To build the demo yourself and experience the power of Trino firsthand, follow the step-by-step guidehere To learn more about HDInsight on AKS - Read our documentation -https://aka.ms/hdionaks-docs Join our community, share an idea or share your success story -https://aka.ms/hdionakscommunity Have a question on how to migrate or want to discuss a use case -https://aka.ms/askhdinsight2KViews0likes0CommentsMigration of Apache Spark from HDInsight 5.0 to HDInsight 5.1
Azure HDInsight Spark 5.0 to HDI 5.1 Migration A new version of HDInsight 5.1 is released with Spark 3.3.1. This release improves join query performance via Bloom filters, increases the Pandas API coverage with the support of popular Pandas features such as datetime.timedelta and merge_asof, simplifies the migration from traditional data warehouses by improving ANSI compliance and supporting dozens of new built-in functions. In this article we will discuss about the migration of user applications from HDInsight Spark 3.1 to HDInsight Spark 3.314KViews1like0CommentsSecuring Azure HDInsight: ESM Support with Ubuntu 18.04, Cluster Updates, and Best Practices
Azure HDInsight, Microsoft's cloud-based big data analytics platform, continues to advance its features to provide users with a secure and efficient environment. In this article, we will explore the latest enhancements, focusing on Expanded Security Maintenance (ESM) support, the importance of regular cluster updates, and best practices recommended by Microsoft to fortify HDInsight deployments. The foundation of a secure Azure HDInsight environment lies in its ability to address critical vulnerabilities promptly. Microsoft ensures this by shipping the latest HDInsight images with Expanded Security Maintenance (ESM) support, which provides a framework for ongoing support, stability with minimal changes specifically targeting critical, high and some medium-level fixes. This ensures that HDInsight users benefit from a continuously updated and secure environment. ESM Support in Latest Images:Azure HDInsight 5.0 and 5.1 versions use Ubuntu 18.04 pro image. Ubuntu Pro includes security patching for all Ubuntu packages due to Expanded Security Maintenance (ESM) for Infrastructure and Applications. Ubuntu Pro 18.04 LTS will remain fully supported until April 2028.For more information on what's new in the latest HDInsight images with ESM support, users can refer to the official release notes on the Azure HDInsight Release Notes Archive. Periodic Cluster Updates:Maintaining a secure HDInsight environment requires diligence in keeping clusters up to date. Microsoft facilitates this process through the HDInsight OS patching mechanism. Periodically updating clusters using the procedures outlined in the official documentation ensures that users benefit from the latest features, performance improvements, and crucial security patches.Learn more about updating HDInsight clusters through the Azure HDInsight OS Patching documentation. ESM and HDI Release Integration:Extended Security Maintenance is seamlessly integrated into HDInsight releases. As part of each HDInsight release, critical fixes provided by ESM are bundled. This ensures that users benefit from the latest security enhancements with each new release. Customer Recommendation: Use the Latest Image:To maximize the benefits of the latest features and security updates, customers are strongly recommended to use the most recent HDInsight image number. By doing so, organizations ensure that their HDInsight clusters are fortified against the latest threats and vulnerabilities. Accessing Fixed CVE Details:For users seeking detailed information about the fixed Common Vulnerabilities and Exposures (CVEs), the Ubuntu CVE site serves as a valuable resource. Here, users can access comprehensive insights into the specific vulnerabilities addressed in each release, empowering them to make informed decisions about their security posture. Best Practice: Transitioning to HDInsight on AKS:In line with Microsoft's best practices, customers are encouraged to consider adopting Azure HDInsight on Azure Kubernetes Service (AKS) based on Azure Linux. This approach streamlines operations and simplifies the management of HDInsight clusters, contributing to an optimized and efficient big data processing environment. Learn more about Microsoft Azure HDInsight AKS documentation.2.1KViews0likes0CommentsSparking New Possibilities: Unleashing the power of HDInsight on AKS
In today's data-driven world, organizations rely on data analytics and processing to gain valuable insights that drive informed decision-making. Apache Spark has emerged as a powerful tool for big data processing, and Microsoft's HDInsight service on Azure Kubernetes Service (AKS) is making it easier than ever to harness its capabilities. In this article, we'll explore the convergence of HDInsight and AKS, focusing on the immense potential it unlocks for Apache Spark users. HDInsight on AKS: A Brief Overview HDInsight on AKS is a modern, reliable, secure, and fully managed Platform as a Service (PaaS) that runs on Azure Kubernetes Service (AKS). HDInsight on AKS allows you to deploy popular Open-Source Analytics workloads like Apache Spark, Apache Flink, and Trino without the overhead of managing and monitoring containers. You can build end-to-end, petabyte-scale Big Data applications spanning streaming through Apache Flink, data engineering and machine learning using Apache Spark, and Trino's powerful query engine. With Spark, organizations can process large volumes of data, perform complex analytics, and build machine learning models without the burden of managing the underlying infrastructure. AKS: Revolutionizing Container Orchestration Azure Kubernetes Service (AKS) is Microsoft's managed Kubernetes service, designed to simplify the deployment, management, and scaling of containerized applications. Kubernetes has become the de facto standard for container orchestration, and AKS makes it accessible and efficient for enterprises of all sizes. To understand more about evolution of Kubernetes refer this article. The amalgamation of analytics and containers HDInsight on AKS represents a significant step forward in Azure data landscape. With Spark on the new stack, users can take advantage of both the power data analytics and scalability of Kubernetes for container orchestration. Here are some key benefits of using HDInsight on AKS with Spark: Scalability: One of the primary advantages of AKS is its ability to automatically scale resources up or down based on demand. With Spark on AKS, you can easily handle varying workloads without the need for manual intervention. Whether you have a small batch job or a massive data processing task, AKS can scale accordingly, ensuring optimal resource utilization. Resource Efficiency: AKS provides resource isolation through Kubernetes namespaces and resource quotas. This isolation ensures that Spark applications do not interfere with each other, leading to more predictable and stable performance. You can allocate the right amount of resources to each Spark job, preventing resource contention issues. Portability: Running Spark on AKS makes your Spark workloads highly portable. You can encapsulate your Spark applications in containers and deploy them, making it easier to manage dependencies and ensuring consistent behaviour across different environments. Integration with Azure Services: HDInsight on AKS seamlessly integrates with other Azure services like Azure Data Lake Storage, Azure Key Vault, and Microsoft Fabric. This means you can easily ingest, process, and analyse data from various sources and use Spark to gain insights and make data-driven decisions. Cost Optimization: HDInsight on AKS provides fine-grained control over resource allocation since user has the freedom to integrate only the Azure technologies of their choosing. This allows you to optimize costs by only paying for the resources you consume. This cost-effectiveness is especially valuable for organizations looking to maximize their return on investment in data analytics. What's new? Spark with HDInsight on AKS is a PaaS offering. We have designed this platform to cater to enhance productivity and improve experiences for the different personas that use Spark such as data engineer’s working on ETL jobs, data scientists performing experimentation and the business analysts who like to slice and dice data. All of these personas have something to be excited about: Script actions can help customize the HDInsight on AKS clusters to extend the clusters and perform custom installations (example – monitoring tools, security packages). Library management - Install and manage useful python libraries with a simple intuitive interface which allows you to install, manage and configure the packages required to make your analytics experience better. Configuration Management – You can simply modify or add Spark and Yarn based configurations in the cluster, with azure portal interface allowing you to add custom configurations and manage the cluster effectively to your enterprise use case needs. Notebook Experience: Submit jobs via Notebooks i.e. Jupyter and Zeppelin. User can also submit Spark-Submit jobs using WebSSH shell. The notebooks are the easiest way to submit a job. Users can have shared notebooks for multiuser scenarios, download and upload notebooks for future usage and have interactive visualizations. Getting Started with HDInsight on AKS To get started with Spark with HDInsight on AKS, follow these steps: Deploy the HDInsight Spark cluster on AKS. Develop and run Spark applications using familiar tools like Jupyter Notebooks, Apache Zeppelin, manage cluster and submit jobs through SDK and ARM templates. Leverage the power of Spark to analyze, process, and visualize your data. For more information on how to Create and manage Azure HDInsight on AKS Spark cluster click here. Conclusion The integration of HDInsight with AKS brings a new level of agility, scalability, and efficiency to big data analytics with Apache Spark. This combination of two powerful Azure services empowers organizations to unlock valuable insights from their data, enabling data-driven decision-making at scale. Whether you're a data scientist, a developer, or a business leader, HDInsight on AKS with Spark provides the tools you need to succeed in today's data-driven world. We are super excited to get you started, let's get to how? Signup today - https://aka.ms/starthdionaks Read our documentation - https://aka.ms/hdionaks-docs Join our community, share an idea or share your success story - https://aka.ms/hdionakscommunity Have a question on how to migrate or want to discuss a use case - https://aka.ms/askhdinsight3KViews14likes3CommentsRealize Lakehouse using best of breed of Open source using HDInsight
Author: Reems Thomas Kottackal, Product Manager HDInsight on AKS is a modern, reliable, secure, and fully managed Platform as a Service (PaaS) that runs on Azure Kubernetes Service (AKS). HDInsight on AKS allows an enterprise to deploy popular open-source analytics workloads like Apache Spark, Apache Flink, and Trino without the overhead of managing and monitoring containers. You can build end-to-end, petabyte-scale Big Data applications spanning event storage using HDInsight Kafka, streaming through Apache Flink, data engineering and machine learning using Apache Spark, and Trino's powerful query engine. In combination with Azure analytics services like Azure data factory, Azure event hubs,Power BI, Azure Data Lake Storage. HDInsight on AKS can connect seamlessly with HDInsight. You can reap the benefits of using needed cluster types in a hybrid model. Interoperate with cluster types of HDInsight using the same storage and meta storeacross both the offerings. The following diagram depicts an example of end-end analytics landscape realized through HDInsight workloads. We are super excited to get you started, lets get to how? Signup today -https://aka.ms/starthdionaks Read our documentation -https://aka.ms/hdionaks-docs Join our community, share an idea or share your success story - https://aka.ms/hdionakscommunity Have a question on how to migrate or want to discuss a use case -https://aka.ms/askhdinsight2.9KViews0likes0Comments