azure architecture
3 TopicsLeverage Azure Durable Functions to build lightweight ETL Data Jobs
This blog is co-authored by Dr. Magesh Kasthuri, Distinguished Member of Technical Staff (Wipro) and Sanjeev Radhakishin Assudani, Azure COE Principal Architect (Wipro). This blog post aims to provide you with insights into how Azure Durable functions can be considered as an alternate design choice to build lightweight Azure native solution for data ingestion and transformation. While the solution discussed in this blog pertains to a healthcare industry customer, the design approach presented here is generic and applicable across industries. The scenario A leading healthcare provider planned to modernize Medicare Auto Enrollment Engine (AEE) and Premium Billing capabilities to enable a robust, scalable, and cost-effective solution across their Medicare business line. One of the key requirements was to build an integration layer to their healthcare administration platform into its database which will process the benefit enrollment and maintenance of hundreds of JSON files. Proposed solution will ingest, transform, and load the data in their Database platform on a daily incremental file and monthly audit file basis. The challenge was to identify a most cost effective ETL data engine solution in the real-world scenario to do complex processing in the integration layer yet lightweight. Below is the list of possible solutions identified: o Azure Data Bricks o Mulesoft APIs o Azure Logic Apps o Azure Durable Functions After careful evaluation, Azure Durable Function was chosen to build the integration layer. The following objectives were identified: Azure Durable functions offer modernized and scalable solution for building and managing serverless workflows Lightweight data jobs can be implemented using durable functions and avoid heavy compute intensive services when not needed. Optimized performance to complete the end-to-end enrichment process within hours. Solution components In today's data-driven world, the ability to efficiently handle ETL (Extract, Transform, Load) jobs is crucial for any organization looking to gain insights from their data. Azure provides a robust platform to develop native solutions for ETL jobs, utilizing a combination of Azure Data Factory (ADF) pipelines, Azure Durable Functions, Azure SQL Database, and Azure Storage. This article will guide you through the detailed process of developing an Azure native solution for ETL jobs, encompassing data load, ingestion, transformation, and staging activities. This solution approach avoids Azure Data Lake (ADLS 2) or Databricks to avoid cost bulge or heavy weight architecture and also helps you to define a lightweight reference architecture for high load data processing jobs. Architecture Overview The architecture for an Azure native ETL solution involves several components working together seamlessly. The key components include: Azure Data Factory (ADF) Pipeline: Orchestrates data flow and automates ETL processes. Azure Durable Functions: Handles ingestion and transformation tasks using C# and .NET code. Azure SQL Database: Used for data enrichment and final storage. Azure Storage: To store raw feed files, manage staging activities and temporary data storage. Application Insights & Monitoring: Provides observability and activity tracking. Azure Durable Function Monitor: It provides UI to debug, monitor and manage the orchestration instances. Azure Key Vault: To store secrets like keys, connection strings. Architecture Diagram Azure Data Factory (ADF) Pipeline ADF serves as the backbone of the ETL process. It orchestrates the entire data flow, ensuring that data is moved efficiently from one stage to another. ADF pipelines can be scheduled to run at specific intervals or triggered by events, providing flexibility in managing ETL workflows. Azure Blob Storage Azure Blob Storage acts as the initial landing zone for raw feed data. It is highly scalable and cost-effective, making it ideal for storing large volumes of data. Data is loaded into Blob Storage from various sources, ready for further processing. Azure Durable Functions Durable Functions are a powerful feature of Azure Functions that allow for long-running, stateful operations. Using C# and .NET code, Durable Functions can perform complex data ingestion and transformation tasks. They provide reliability and scalability, ensuring that data processing is efficient and fault tolerant. Azure SQL Database Azure SQL Database is used for data enrichment and final storage. After the transformation process, data is loaded into the SQL database where it can be enriched with additional metadata and made ready for analytics and reporting. It provides high performance, security, and availability. Azure Storage for Staging Activities During the ETL process, intermediate data needs to be temporarily stored. Azure Storage plays a crucial role in managing these staging activities. It ensures that data is available for subsequent processing steps, maintaining the integrity and flow of the ETL pipeline. Observability and Monitoring Application Insights Application Insights is an essential tool for monitoring the health and performance of your ETL solution. It provides real-time insights into application performance, helping to identify and troubleshoot issues quickly. By tracking metrics and logs, you can ensure that your ETL processes are running smoothly and efficiently. Activity Tracking Activity tracking is crucial for understanding the flow and status of data through the ETL pipeline. Logging and monitoring tools can provide detailed information about each step in the process, allowing for better visibility and control. This ensures that any anomalies or failures can be detected and addressed promptly. Durable Function Monitor This is an important tool to list, monitor and debug the orchestrations inside the Azure Durable Function. We can configure this as an extension in Visual Studio code. It helps to view the different instances of orchestrators and activity functions. It also shows the time taken to execute them, this is important for tracking the performance of the different steps in the ETL process. We can also view the Azure Durable Function in the form of a function graph. Kudu Logs This traces the execution of the different orchestrators, activity functions and the native functions. This helps to see the exceptions raised, or whether there are replay happening for the orchestrators, activity functions. Best Practices for Implementing the Solution Here are some best practices to ensure the successful implementation of your Azure native ETL solution: Design for Scalability: Ensure that your solution can handle increasing data volumes and processing demands by leveraging Azure's scalable services. Optimize Data Storage: Use appropriate data storage solutions for different stages of the ETL process, balancing cost and performance. Implement Robust Monitoring: Use Application Insights, Durable Function Monitor and other monitoring tools to track performance and detect issues early. Ensure Data Security: Implement strong security measures to protect sensitive data at rest and in transit. Automate and Schedule Pipelines: Use ADF to automate and schedule ETL pipelines, reducing manual intervention and ensuring consistency. Use Durable Functions for Complex Tasks: Leverage Azure Durable Functions for long running and stateful operations, ensuring reliability and efficiency. By following these guidelines and leveraging Azure's powerful tools and services, you can develop a robust and efficient ETL solution that meets your data processing needs. Azure provides a flexible and scalable platform, enabling you to handle large data volumes and complex transformations with ease. Embrace the power of Azure to unlock the full potential of your data.503Views4likes1CommentProactively design, deploy & monitor resilient Azure workloads
Do you want to know how to get resilient and stay resilient? Explore the architectural features needed to uphold stringent uptime requirements for critical deployments. Learn how to design resiliency into workloads and environments by implementing Azure landing zones and infrastructure-as-code modules. We will demo native bicep and Azure Verified Modules while explaining the scenarios in which you need those. We'll also show how to use the Azure Proactive Resiliency Library, Azure Advisor and Azure Monitor baseline alerts to minimize outage impacts and increase productivity. This session is part of Tech Accelerator: Mastering Azure and AI adoption. View the full agenda for more great sessions and insights.377Views2likes1Comment