There are many reasons why customers consider migrating their existing on-premises big data workloads to Azure.
EHMA - Enabling quicker, easier and efficient Hadoop migrations, hence making Azure as the preferred cloud while migrating Hadoop workloads.
Many customers with On-prem Hadoop are facing extensive technical blockers be it for designing their On-cloud architectures or migrating it. Assessment of On-prem Hadoop infrastructure with the help of pre-built scripts and questionnaire will set off to a better planned migration and clear roadblocks in the early phases.
EHMA focuses on specific guidance and considerations you can follow to help move your existing platform/infrastructure -- On-Premises and Other Cloud to Azure. EHMA covers the following Hadoop ecosystem:
Component | Description | Decision Flow/Flowchats | Target solutions |
---|---|---|---|
Apache HDFS | Distributed File System | Planning the data migration , Pre-checks prior to data migration | Azure Data Lake Storage gen2 |
Apache HBase | Column-oriented table service | Choosing landing target for Apache HBase , Choosing storage for Apache HBase on Azure | HBase on VM, HDInsight, Cosmos DB |
Apache Hive | Datawarehouse infrastructure | Choosing landing target for Hive, Selecting target DB for hive metadata | Hive on VM, HDInsight, Synapse |
Apache Spark | Data processing Framework | Choosing landing target for Apache Spark on Azure | HDInsight, Synapse, Databricks |
Apache Ranger | Frame work to monitor and manage Data secuirty | HDInsight Enterprise Security Package, Azure AD, Ranger on VM | |
Apache Sentry | Frame work to monitor and manage Data secuirty | Choosing landing Targets for Apache Sentry on Azure | Sentry/Ranger on VM, HDInsight Engerprise Security Package, Azure AD |
Apache MapReduce | Distributed computation framework | MapReduce, Spark | |
Apache Zookeeper | Distributed coordination service | ZooKeeper on VM, Built-in solution in PaaS | |
Apache YARN | Resource manager for Hadoop ecosystem | YARN on VM, Built-in solution in PaaS | |
Apache Storm | Distributed real-time computing system | Choosing landing targets for Apache Storm on Azure | Storm/Flink/etc on VM. Stream Analytics, Spark Streaming on HDInsight/Databricks, Functions |
Apache Sqoop | Command line interface tool for transferring data between Apache Hadoop clusters and relational databases | Choosing landing targets for Apache Sqoop on Azure | Sqoop on VM, Sqoop on HDInsight, Data Factory |
Apache Kafka | Highly scalable fault tolerant distributed messaging system | Choosing landing targets for Apache Kafka on Azure | Kafka on VM, Event Hub for Kafka, HDInsight |
Apache Atlas | Open source framework for data governance and Metadata Management | Purview |
One of the challenges while migrating workloads from on-premises Hadoop to Azure is having the right deployment done which is aligning with the desired end state architecture and the application.
The Bicep deployment template(Reference Architecture Deployment ) aims to reduce a significant effort which goes behind deploying the PaaS services on Azure as below and having a production ready architecture up and running.
The above diagram depicts the end state architecture for big data workloads on Azure PaaS listing all the components deployed as a part of bicep template deployment. With Bicep we also have an additional advantage of deploying only the modules we prefer for a customised architecture.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.