HDInsight on AKS
12 TopicsSecuring Azure HDInsight: ESM Support with Ubuntu 18.04, Cluster Updates, and Best Practices
Azure HDInsight, Microsoft's cloud-based big data analytics platform, continues to advance its features to provide users with a secure and efficient environment. In this article, we will explore the latest enhancements, focusing on Expanded Security Maintenance (ESM) support, the importance of regular cluster updates, and best practices recommended by Microsoft to fortify HDInsight deployments. The foundation of a secure Azure HDInsight environment lies in its ability to address critical vulnerabilities promptly. Microsoft ensures this by shipping the latest HDInsight images with Expanded Security Maintenance (ESM) support, which provides a framework for ongoing support, stability with minimal changes specifically targeting critical, high and some medium-level fixes. This ensures that HDInsight users benefit from a continuously updated and secure environment. ESM Support in Latest Images:Azure HDInsight 5.0 and 5.1 versions use Ubuntu 18.04 pro image. Ubuntu Pro includes security patching for all Ubuntu packages due to Expanded Security Maintenance (ESM) for Infrastructure and Applications. Ubuntu Pro 18.04 LTS will remain fully supported until April 2028.For more information on what's new in the latest HDInsight images with ESM support, users can refer to the official release notes on the Azure HDInsight Release Notes Archive. Periodic Cluster Updates:Maintaining a secure HDInsight environment requires diligence in keeping clusters up to date. Microsoft facilitates this process through the HDInsight OS patching mechanism. Periodically updating clusters using the procedures outlined in the official documentation ensures that users benefit from the latest features, performance improvements, and crucial security patches.Learn more about updating HDInsight clusters through the Azure HDInsight OS Patching documentation. ESM and HDI Release Integration:Extended Security Maintenance is seamlessly integrated into HDInsight releases. As part of each HDInsight release, critical fixes provided by ESM are bundled. This ensures that users benefit from the latest security enhancements with each new release. Customer Recommendation: Use the Latest Image:To maximize the benefits of the latest features and security updates, customers are strongly recommended to use the most recent HDInsight image number. By doing so, organizations ensure that their HDInsight clusters are fortified against the latest threats and vulnerabilities. Accessing Fixed CVE Details:For users seeking detailed information about the fixed Common Vulnerabilities and Exposures (CVEs), the Ubuntu CVE site serves as a valuable resource. Here, users can access comprehensive insights into the specific vulnerabilities addressed in each release, empowering them to make informed decisions about their security posture.2.1KViews0likes0CommentsRun federated queries on Trino with HDInsight on AKS
HDInsight on AKS is a modern, reliable, secure, and fully managed Platform as a Service (PaaS) that runs on Azure Kubernetes Service (AKS). HDInsight on AKS allows you to deploy popular Open-Source Analytics workloads like Apache Spark, Apache Flink,and Trino without the overhead of managing and monitoring containers. This blog explains how you can use Trino with HDInsight on AKS to experience the flavor of running federated queries. Trino is an open-source distributed SQLquery engine and well known for its capability to query data from multiple sources without moving your data. That is cool!! Isn't it? And of course! you can use Trino for other use cases as well. Interactive and ad-hoc querying. Batch ETL across heterogenous systems. Reporting and Dashboarding. With Trino on HDinsight on AKS, you can set up your cluster easily and fast with a few clicks. You can also enjoy the advantages of auto scaling, an intuitive UI to configure your cluster, and monitoring tools like Azure monitor and managed Prometheus and Grafana. Moreover, there are some exclusive features, such as a special connector for Power BI, result caching for all connectors, query scan statistics to analyze query performance, and Shared SQL connector. There is lot more coming on your way with intuitive UI for catalog management, fine-grained access control with Apache Ranger, and an upgraded OSS version. Stay tuned!! Let's see in action how you can run federated queries on Trino with HDInsight on AKS. Scenario: Analyze NYC taxi data to understand the average fare and the passenger count at different zones. Summary of steps for the demo: Take NYC taxi data from the official source. Land the data in ADLS Gen 2 and expose it as a Hive table in Trino. Prepare zone data and land in Azure Database for PostgresSQL. Run a federated query on two data sources ADLS Gen2 and Azure Database for PostgresSQL. To build the demo yourself and experience the power of Trino firsthand, follow the step-by-step guidehere To learn more about HDInsight on AKS - Read our documentation -https://aka.ms/hdionaks-docs Join our community, share an idea or share your success story -https://aka.ms/hdionakscommunity Have a question on how to migrate or want to discuss a use case -https://aka.ms/askhdinsight2KViews0likes0CommentsMigration of Apache Spark from HDInsight 5.0 to HDInsight 5.1
Azure HDInsight Spark 5.0 to HDI 5.1 Migration A new version of HDInsight 5.1 is released with Spark 3.3.1. This release improves join query performance via Bloom filters, increases the Pandas API coverage with the support of popular Pandas features such as datetime.timedelta and merge_asof, simplifies the migration from traditional data warehouses by improving ANSI compliance and supporting dozens of new built-in functions. In this article we will discuss about the migration of user applications from HDInsight Spark 3.1 to HDInsight Spark 3.314KViews1like0CommentsRealize Lakehouse using best of breed of Open source using HDInsight
Author: Reems Thomas Kottackal, Product Manager HDInsight on AKS is a modern, reliable, secure, and fully managed Platform as a Service (PaaS) that runs on Azure Kubernetes Service (AKS). HDInsight on AKS allows an enterprise to deploy popular open-source analytics workloads like Apache Spark, Apache Flink, and Trino without the overhead of managing and monitoring containers. You can build end-to-end, petabyte-scale Big Data applications spanning event storage using HDInsight Kafka, streaming through Apache Flink, data engineering and machine learning using Apache Spark, and Trino's powerful query engine. In combination with Azure analytics services like Azure data factory, Azure event hubs,Power BI, Azure Data Lake Storage. HDInsight on AKS can connect seamlessly with HDInsight. You can reap the benefits of using needed cluster types in a hybrid model. Interoperate with cluster types of HDInsight using the same storage and meta storeacross both the offerings. The following diagram depicts an example of end-end analytics landscape realized through HDInsight workloads. We are super excited to get you started, lets get to how? Signup today -https://aka.ms/starthdionaks Read our documentation -https://aka.ms/hdionaks-docs Join our community, share an idea or share your success story - https://aka.ms/hdionakscommunity Have a question on how to migrate or want to discuss a use case -https://aka.ms/askhdinsight2.9KViews0likes0Comments