Mainframe Db2 Data Virtualization using SQL Server PolyBase
Published Jul 03 2023 11:03 AM 5,074 Views
Microsoft

Introduction

In this technical blog, we will explore the technology of SQL Server PolyBase with the specific use case for seamless data integration between SQL Server and Db2 on a Mainframe using the ODBC option. PolyBase revolutionizes the way we access and query data across disparate platforms, offering unparalleled benefits over traditional approaches.

Overview

SQL Server PolyBase introduces a paradigm shift in data virtualization, redefining how we integrate and access data across disparate platforms. The specific use case highlighted in this blog describes the process of using PolyBase in SQL Server to access Db2 table data on Mainframes (this process will work for Db2 for iSeries and Db2 LUW with some minor driver /database specific configuration changes) enabling real-time and direct access to external data sources unlocking new possibilities for data integration and analysis. Unlike traditional methods that involve complex data extraction, transformation, and loading (ETL) processes, PolyBase simplifies the data integration journey by providing direct and real-time access to external data sources. SQL Server also has similar feature named Linked Servers which also enables you to read data from remote data sources. Differences between PolyBase and Linked Servers is highlighted on this documentation link.

Key Differentiators of SQL Server PolyBase for Db2

  1. Real-Time Connectivity: PolyBase establishes direct connectivity between SQL Server and Db2, enabling real-time access to data. This eliminates the need for manual data synchronization or batch processing, ensuring that you are always working with the most up-to-date information.
  2. Query Federation: With PolyBase, you can seamlessly query data residing in both SQL Server and Db2 in a single query. This eliminates the need for complex joins or data movement between platforms, simplifying your data analysis tasks and enhancing productivity.
  3. Simplified Data Virtualization: PolyBase introduces a virtualization layer that abstracts the complexities of accessing external data sources. It allows you to define external tables in SQL Server that mirror the structure of Db2 tables, enabling seamless querying and integration without physically moving or replicating the data.
  4. Unified Security and Access Control: PolyBase integrates with the existing security mechanisms of SQL Server, ensuring that access to external data sources is governed by the same authentication and authorization policies. This simplifies security management and provides a unified view of access control across both platforms.
  5. Improved Performance: PolyBase optimizes query execution by intelligently distributing the workload across SQL Server and Db2, resulting in faster response times and enhanced scalability.

Prerequisites

Before we begin, ensure that you have the following in place:

  1. SQL Server instance with PolyBase feature installed and enabled.
  2. Db2 on Mainframe installed and accessible with the necessary credentials.
  3. ODBC driver to access Db2 on Mainframe installed on the SQL Server machine.

Virtualizing Db2 data on SQL Server using PolyBase 

Below diagram provides information on components present in implementing SQL Server PolyBase for accessing Db2 data.

                      SQL Server PolyBase to Db2 z/OS High Level Architecture

Sandip_Khandelwal_0-1688380482204.png

Set of SQL queries shown below are used to connect to Db2 database from SQL Server. Below SQL queries are run using SQL Server Management Studio Software connected to SQL Server 2022.

Step 1: Create the Test Database in SQL Server

 

USE MASTER;
CREATE DATABASE POLYBASE2DB2;

 

Step 2: Create the Database Master Key

 

CREATE MASTER KEY ENCRYPTION BY PASSWORD 
= 'TestPassword@abcdef1946123456789' ;

 

Step 3: Create Database Scoped Credential

 

--DROP DATABASE SCOPED CREDENTIAL db2cred01
CREATE DATABASE SCOPED CREDENTIAL db2cred01
WITH IDENTITY = 'db2usrid', SECRET = 'db2idpwd';

 

 Step 4: Enable PolyBase Service (if not already enabled)

 

--Check if SQL Server PolyBase is installed. If not; Install SQL Serer
--PolyBase before proceeding
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;
--Enable SQL Server PolyBase
exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
RECONFIGURE;

 

Query Output:

Sandip_Khandelwal_1-1687862664600.png

Step 5: Create External Data Source

Plate note here we are using Progress DataDirect ODBC Driver for Db2 z/OS. Based on availability of Db2 ODBC driver in your environment you will have to change connection options accordingly.

 

CREATE EXTERNAL DATA SOURCE DB2SRCD1
WITH (
    LOCATION = 'odbc://db2serverdnsname1',
    CONNECTION_OPTIONS = 'Driver={DataDirect 7.1 Db2 Wire Protocol};
        IPAddress=;
        TCPPort=958;
        Location=DBNAME;
        Collection=NULLID;
        PolyBaseOdbcSupportsSetDescRec=false’,
        CREDENTIAL=db2cred01
    ,PUSHDOWN = ON
);

 

Step 6: Create External Table

Source Db2 Table definition: Employee

 

CREATE TABLE Employee (
              EmployeeId INTEGER WITH DEFAULT NULL,
              LastName VARCHAR(20) WITH DEFAULT NULL,
              FirstName VARCHAR(20) WITH DEFAULT NULL,
              Title VARCHAR(30) WITH DEFAULT NULL,
              ReportsTo INTEGER WITH DEFAULT NULL,
              BirthDate DATE WITH DEFAULT NULL,
              HireDate DATE WITH DEFAULT NULL,
              Address VARCHAR(70) WITH DEFAULT NULL,
              City VARCHAR(40) WITH DEFAULT NULL,
              State VARCHAR(40) WITH DEFAULT NULL,
              Country VARCHAR(40) WITH DEFAULT NULL,
              PostalCode VARCHAR(10) WITH DEFAULT NULL,
              Phone VARCHAR(24) WITH DEFAULT NULL,
              Fax VARCHAR(24) WITH DEFAULT NULL,
              Email VARCHAR(60) WITH DEFAULT NULL
       )

 

Create External Table definition in SQL Server:

 

CREATE EXTERNAL TABLE EXTDb2SANDIPKEmployee
(
    [EmployeeId] INT,
    [LastName] NVARCHAR(20),
    [FirstName] NVARCHAR(20),
    [Title] NVARCHAR(30),
    [ReportsTo] INT,
    [BirthDate] DATE,
    [HireDate] DATE,
    [Address] NVARCHAR(70),
    [City] NVARCHAR(40),
    [State] NVARCHAR(40),
    [Country] NVARCHAR(40),
    [PostalCode] NVARCHAR(10),
    [Phone] NVARCHAR(24),
    [Fax] NVARCHAR(24),
    [Email] NVARCHAR(60)
)
WITH (
    LOCATION = 'DBNAME.SANDIPK.Employee',
    DATA_SOURCE = DB2SRCD2
);

 

 Step 7: Test PolyBase Query 1 – Get 5 rows from Db2 table

 

SELECT TOP 5 * FROM EXTDb2SANDIPKEmployee;

 

Query output:

Sandip_Khandelwal_2-1687862730150.png

Step 8: Test PolyBase Query 2 – Get 2 rows form Db2 table with EmployeeId either 1 or 2

 

SELECT TOP 2 * FROM EXTDb2SANDIPKEmployee Where EmployeeId in (1,2);

 

Query Output:

Sandip_Khandelwal_3-1687862756864.png

Step 9: Test PolyBase Query 3 – Join SQL Server Table data with Db2 Table Data

SQL Server Table definition:

 

CREATE TABLE [SANDIPK].[Title](
       [Title] [nvarchar](30) NULL,
       [Level] [int] NULL,
       [ExecutiveLevel] [nvarchar](1) NULL,
       [ManagerRole] [nvarchar](1) NULL
)

 

Data in SQL Server Title Table :

 

SELECT * FROM SANDIPK.Title;

 

Query Output:

Sandip_Khandelwal_4-1687862792583.png

Combine data from SQL Server Title table and Db2 Table Employee:

 

SELECT TOP 3 * 
  FROM EXTDb2SANDIPKEmployee Emp,
       SANDIPK.Title
 WHERE Emp.Title = Title.Title
   AND Title.ManagerRole = 'Y;

 

Query Output:

Sandip_Khandelwal_5-1687862811113.png

Conclusion

  • PolyBase represents a technology that enables seamless data integration and analysis between SQL Server and Db2 on Mainframe (or Db2 LUW / Db2i). Its real-time connectivity, query federation capabilities, simplified data virtualization, unified security, and improved performance set it apart from traditional approaches.
  • By embracing PolyBase, organizations can unlock the full potential of their data present on Mainframe, break down silos between Mainframe data sources and x86 platform data stores, and gain deeper insights. The power of PolyBase lies in its ability to provide a unified and real-time view of data, empowering businesses to make informed decisions based on comprehensive and up-to-date information.

Important points for Consideration

  • Example shown uses trial version of ODBC for Db2 z/OS Wire Protocol driver from Progress DataDirect downloaded from location: link. Based on your requirement you need to license the appropriate ODBC driver from Progress DataDirect.

  • The above technical blog provides high-level guidance with no warranties with respect to the information contained herein.

Limitations

  • The above process is tested using SQL Server PolyBase for Db2 z/OS database and Db2 z/OS ODBC Progress DataDirect driver. There will be necessary changes required to get this working when using other Db2 ODBC drivers or other Db2 sources (e.g., Db2 LUW or Db2i).
  • This process works for SQL Server on premises / SQL Server on Azure virtual machine. Polybase is currently not supported directly in Azure SQL PaaS or Azure Synapse Dedicated Pools.

Feedback and suggestions

If you have feedback or suggestions for improving this data migration asset, please contact the Databases SQL CSE/Ninja Engineering Team (datasqlninja@microsoft.com). Thanks for your support!

Note: For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.

Co-Authors
Version history
Last update:
‎Jul 03 2023 03:37 AM
Updated by: