Bring your Mainframe DB2 z/OS data to Microsoft Fabric
Published Jun 17 2023 08:35 PM 6,973 Views
Microsoft

Introduction

 

Db2 for z/OS(IBM Mainframe) is a relational database management system (RDBMS) developed by IBM. The product supports other platforms like DB2 for LUW and DB2 for “i”. It has been widely used by enterprises, including many Fortune 500 companies, for managing their data and supporting critical business applications.

So, from an enterprise standpoint, it is very important to have a seamless integration of DB2 for z/OS with distributed data sources. This would enable enterprises to harness the full potential of their data, uncovering meaningful patterns, trends, and relationships. These insights would empower organizations to make informed decisions, optimize processes, enhance customer experiences, and gain a competitive edge in the market.

The latest & the unified product suite, Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, real-time analytics, and business intelligence.

In this technical blog, we will explore the convenience and ease of opening DB2 for z/OS data in Microsoft Fabric, the latest unified product suite. Microsoft Fabric offers an all-in-one analytics solution for enterprises, encompassing data movement, data science, real-time analytics, and business intelligence. By transferring data to Fabric, data engineers and scientists gain access to a wide range of tools and technologies to extract valuable insights from the data. To follow along with this blog, we recommend setting up and enabling Fabric by referring to Part 1 of the Get Started with Microsoft Fabric – All in-one place for all your Analytical needs…. Part 1 - Micros...

 

General Guidelines

For a large enterprise with multiple disconnected systems and technologies, Microsoft Fabric makes it easier for their data engineers & data scientists to unlock the data potential. The whole pain of the data integration & analytics could be offloaded to Fabric.

Below are the high-level ideas for opening up Db2 for z/OS data to Fabric:

  • Using Data flow as an inbuilt ETL to sync in data from DB2 to Lakehouse.
  • Using Fabric product suite to draw insights from the DB2 data.

Major components of Fabric that we would be using to achieve these are:

Data Flow: Dataflows provide a low-code interface for ingesting data from hundreds of data sources, transforming your data using 300+ data transformations. Read more here.

Data Pipeline: Data pipelines enable powerful workflow capabilities at cloud-scale. Use data pipelines to build complex ETL and data factory workflows that can perform many different tasks on a scale. Read more here.

Data Gateway: The on-premises data gateway acts as a secure bridge between on-premises data sources, such as DB2 for z/OS, and Microsoft cloud services. It enables quick and secure data transfer, ensuring seamless connectivity.

Lakehouse: Microsoft Fabric Lakehouse serves as a unified data architecture platform, allowing organizations to store, manage, and analyze structured and unstructured data in a single location. Learn more about the concept of a Lakehouse here.

Power BI:   Power BI is the world's leading Business Intelligence platform. It ensures that business owners can access all the data in Fabric quickly and intuitively to make better decisions with data. For more information, see What is Power BI?

DB2 Data flows to Lakehouse

  1. Setup Data Gateway on a machine which has access to the the Db2 database.
  2. Create a workspace & Lakehouse in Microsoft Fabric.
  3. Configure a Data Flow in the workspace to synchronize data from DB2 to the Fabric Lakehouse.

Fabric native engineering capabilities   

  • View Data Lineage to differentiate the flow from different sources.
  • Use Data Pipeline to copy the DB2 data from Lakehouse to various targets.
  • Use read-only SQL endpoint to query the DB2 data in Lakehouse, from application like SSMS.
  • Run Spark based notebooks on the DB2 data in Lakehouse.

Architecture Overview

PDasgupta_0-1687057462304.png

 

This architecture diagram shows the data flow from DB2 to Fabric. We will explain the components and the set up in detail.

Data flows from DB2 to Lakehouse

Install Data Gateway

To get the data to Lakehouse, we need to establish a connection to the On-Premises Mainframe and the DB2 database. We could do that using a data gateway. Follow this to install one on a system that has access to the DB2 for z/OS database: Install an on-premises data gateway | Microsoft Learn

 

Once done you should see the connection status as below:

 

PDasgupta_1-1687057462457.png

 

Create a Data Flow in your Fabric Workspace

Once the Gateway is installed, go to the Fabric portal, and create a workspace & Lakehouse, if not already, and build a data flow from DB2 Database to Lakehouse.

PDasgupta_2-1687057462272.png

 

 

Click on the Data Flow and then click on other sources when you land on the page below:

PDasgupta_3-1687057462392.png

 

 

Select IBM DB2 from all the pre-defined sources:

PDasgupta_4-1687057462301.png

 

 

PDasgupta_5-1687057462285.png

 

 

Click on learn more for detailed instructions on filling up the connection details.

 

Once the connection was established, select the Lakehouse and destination, and hit the publish button:

PDasgupta_6-1687057462657.png

 

Data flow could be set up to sync the data from the Db2 database to Lakehouse at a regular interval. Example: We can get up the sync after every nightly batch run in the Mainframes.

  • Go to Dataflow settings and enable ‘Refresh’ as shown below.
  • Set up the desired time & frequency of the sync.
  • Add contacts who are to be notified for any failures.

PDasgupta_7-1687057462293.png

 

 

Here is the sample table, that we synced from DB2 for z/OS to Lakehouse:

PDasgupta_8-1687057462259.png

 

 

Fabric native engineering capabilities

This space is evolving at a rapid pace, and we expect many more capabilities to be added.

 

View Data Lineage

 

This pane gives a bird eye view of all the data sources. At an enterprise scale with multiple data sources, this helps to understand and trace back to the various sources.

 

Go to the workspace, right click on the dataflow, and click on “View Lineage.”

 

PDasgupta_9-1687057462289.png

 

 

We can see the IBM Mainframe/DB2 for zOS as a source.

 

PDasgupta_10-1687057462384.png

 

 

Copy the DB2 data from Lakehouse to Azure SQL

 

We can use Data pipeline to copy the regularly synced DB2 data in Lakehouse to an Azure SQL target. This gives us the flexibility to branch out & feed the DB2 data to multiple downstream applications from the business-critical application running in Mainframes.

 

  • Start a Copy Data activity & opt for Lakehouse as a source

PDasgupta_11-1687057462242.png

 

 

  • Keep going as directed by the UI. Feed in the connection parameters and the infrastructure to copy would be built transparently.

PDasgupta_12-1687057462307.png

 

PDasgupta_13-1687057462502.png

 

 

  • You can either choose an existing Azure SQL connection or create one.

PDasgupta_14-1687057462319.png

 

 

  • Once connections are established. You can choose to define the target schema. In here, we would load it onto a new table. The activity can create simple schemas and for complex ones, we can opt for an existing table in Azure SQL.

PDasgupta_15-1687057462499.png

 

 

  • Click Next and then “Save & Run” to run the pipeline.
  • The experience is like ADF Copy activity and we could schedule & scale as needed.

PDasgupta_16-1687057462280.png

 

PDasgupta_17-1687057462520.png

 

 

  • The Azure SQL Database can then take application requests as needed.

 

Use SQL endpoint to query the DB2 data in Lakehouse

 

This is a read-only but a unique ability that Lakehouse gives. We can use the inbuilt SQL endpoint to offload read-only applications that support TSQL to Lakehouse. from Lakehouse, that gets synced on a nightly basis.

 

  • Go to the Lakehouse and click on the settings(blue gear icon) as shown below:

PDasgupta_18-1687057462401.png

 

 

  • Use the SQL string to connect and query the Lakehouse Table. In here we are using SSMS.
  • Open SSMS, opt for ‘Database Engine’, paste the connection string & opt for AAD with MFA:
    • We are showcasing SSMS here but this could work with other products like Azure Data Studio & PowerBI

PDasgupta_19-1687057462397.png

 

 

  • Once the connection is established, we can use TSQL to read the table. There is no need to build or define anything. It is that simple!

PDasgupta_20-1687057462465.png

 

 

 

Important points for Consideration 

  • Microsoft Fabric is in preview and a lot of exiting features would be added.
  • The capabilities shown here are a few and we expect the use cases to evolve with each new release.
  • We did not cover the notebooks on the DB2 data in Lakehouse, but you can continue your learning here.

 

Feedback and suggestions 

If you have feedback or suggestions for improving this data migration asset, please contact the Databases SQL CSE/Ninja Engineering Team (datasqlninja@microsoft.com). Thanks for your support!

Note : For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.

2 Comments
Version history
Last update:
‎Jun 17 2023 08:34 PM
Updated by: