Introduction
In this technical blog, we will explore the technology of SQL Server PolyBase with the specific use case for seamless data integration between SQL Server and Db2 on a Mainframe using the ODBC option. PolyBase revolutionizes the way we access and query data across disparate platforms, offering unparalleled benefits over traditional approaches.
Overview
SQL Server PolyBase introduces a paradigm shift in data virtualization, redefining how we integrate and access data across disparate platforms. The specific use case highlighted in this blog describes the process of using PolyBase in SQL Server to access Db2 table data on Mainframes (this process will work for Db2 for iSeries and Db2 LUW with some minor driver /database specific configuration changes) enabling real-time and direct access to external data sources unlocking new possibilities for data integration and analysis. Unlike traditional methods that involve complex data extraction, transformation, and loading (ETL) processes, PolyBase simplifies the data integration journey by providing direct and real-time access to external data sources. SQL Server also has similar feature named Linked Servers which also enables you to read data from remote data sources. Differences between PolyBase and Linked Servers is highlighted on this documentation link.
Key Differentiators of SQL Server PolyBase for Db2
Prerequisites
Before we begin, ensure that you have the following in place:
Virtualizing Db2 data on SQL Server using PolyBase
Below diagram provides information on components present in implementing SQL Server PolyBase for accessing Db2 data.
SQL Server PolyBase to Db2 z/OS High Level Architecture
Set of SQL queries shown below are used to connect to Db2 database from SQL Server. Below SQL queries are run using SQL Server Management Studio Software connected to SQL Server 2022.
Step 1: Create the Test Database in SQL Server
USE MASTER;
CREATE DATABASE POLYBASE2DB2;
Step 2: Create the Database Master Key
CREATE MASTER KEY ENCRYPTION BY PASSWORD
= 'TestPassword@abcdef1946123456789' ;
Step 3: Create Database Scoped Credential
--DROP DATABASE SCOPED CREDENTIAL db2cred01
CREATE DATABASE SCOPED CREDENTIAL db2cred01
WITH IDENTITY = 'db2usrid', SECRET = 'db2idpwd';
Step 4: Enable PolyBase Service (if not already enabled)
--Check if SQL Server PolyBase is installed. If not; Install SQL Serer
--PolyBase before proceeding
SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsPolyBaseInstalled;
--Enable SQL Server PolyBase
exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
RECONFIGURE;
Query Output:
Step 5: Create External Data Source
Plate note here we are using Progress DataDirect ODBC Driver for Db2 z/OS. Based on availability of Db2 ODBC driver in your environment you will have to change connection options accordingly.
CREATE EXTERNAL DATA SOURCE DB2SRCD1
WITH (
LOCATION = 'odbc://db2serverdnsname1',
CONNECTION_OPTIONS = 'Driver={DataDirect 7.1 Db2 Wire Protocol};
IPAddress=;
TCPPort=958;
Location=DBNAME;
Collection=NULLID;
PolyBaseOdbcSupportsSetDescRec=false’,
CREDENTIAL=db2cred01
,PUSHDOWN = ON
);
Step 6: Create External Table
Source Db2 Table definition: Employee
CREATE TABLE Employee (
EmployeeId INTEGER WITH DEFAULT NULL,
LastName VARCHAR(20) WITH DEFAULT NULL,
FirstName VARCHAR(20) WITH DEFAULT NULL,
Title VARCHAR(30) WITH DEFAULT NULL,
ReportsTo INTEGER WITH DEFAULT NULL,
BirthDate DATE WITH DEFAULT NULL,
HireDate DATE WITH DEFAULT NULL,
Address VARCHAR(70) WITH DEFAULT NULL,
City VARCHAR(40) WITH DEFAULT NULL,
State VARCHAR(40) WITH DEFAULT NULL,
Country VARCHAR(40) WITH DEFAULT NULL,
PostalCode VARCHAR(10) WITH DEFAULT NULL,
Phone VARCHAR(24) WITH DEFAULT NULL,
Fax VARCHAR(24) WITH DEFAULT NULL,
Email VARCHAR(60) WITH DEFAULT NULL
)
Create External Table definition in SQL Server:
CREATE EXTERNAL TABLE EXTDb2SANDIPKEmployee
(
[EmployeeId] INT,
[LastName] NVARCHAR(20),
[FirstName] NVARCHAR(20),
[Title] NVARCHAR(30),
[ReportsTo] INT,
[BirthDate] DATE,
[HireDate] DATE,
[Address] NVARCHAR(70),
[City] NVARCHAR(40),
[State] NVARCHAR(40),
[Country] NVARCHAR(40),
[PostalCode] NVARCHAR(10),
[Phone] NVARCHAR(24),
[Fax] NVARCHAR(24),
[Email] NVARCHAR(60)
)
WITH (
LOCATION = 'DBNAME.SANDIPK.Employee',
DATA_SOURCE = DB2SRCD2
);
Step 7: Test PolyBase Query 1 – Get 5 rows from Db2 table
SELECT TOP 5 * FROM EXTDb2SANDIPKEmployee;
Query output:
Step 8: Test PolyBase Query 2 – Get 2 rows form Db2 table with EmployeeId either 1 or 2
SELECT TOP 2 * FROM EXTDb2SANDIPKEmployee Where EmployeeId in (1,2);
Query Output:
Step 9: Test PolyBase Query 3 – Join SQL Server Table data with Db2 Table Data
SQL Server Table definition:
CREATE TABLE [SANDIPK].[Title](
[Title] [nvarchar](30) NULL,
[Level] [int] NULL,
[ExecutiveLevel] [nvarchar](1) NULL,
[ManagerRole] [nvarchar](1) NULL
)
Data in SQL Server Title Table :
SELECT * FROM SANDIPK.Title;
Query Output:
Combine data from SQL Server Title table and Db2 Table Employee:
SELECT TOP 3 *
FROM EXTDb2SANDIPKEmployee Emp,
SANDIPK.Title
WHERE Emp.Title = Title.Title
AND Title.ManagerRole = 'Y;
Query Output:
Conclusion
Important points for Consideration
Example shown uses trial version of ODBC for Db2 z/OS Wire Protocol driver from Progress DataDirect downloaded from location: link. Based on your requirement you need to license the appropriate ODBC driver from Progress DataDirect.
Limitations
Feedback and suggestions
If you have feedback or suggestions for improving this data migration asset, please contact the Databases SQL CSE/Ninja Engineering Team (datasqlninja@microsoft.com). Thanks for your support!
Note: For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.