Mainframe Data Modernization to Azure

Microsoft

Jan 27, 2022

Objective

Data is a critical part of all Businesses. When there is modernization involved, the migration of data from source to target becomes one of the key factors for the success of the overall workload migration.

In this blog, we are going to outline the different layers of data on the mainframe, the considerations for data migration, and the tools and processes that can be used to move data from the mainframe to Azure Cloud.

Why should we focus on Data Modernization?
Few trends which we have observed during our engagement with customers are as below:

Data in an application grows exponentially over time.
Customers face many challenges while managing data on legacy systems.
Roadblocks in performance/cost to analyze data on Mainframe using currently available AI/ML techniques.
Microsoft Products & ISV Tools on the marketplace can accelerate this migration.
Enterprises incur more costs when they continue to host data on legacy platforms as against moving them to modern platforms.

Data on the Mainframes: Mainframes have been one of the early infrastructure choices for Industry since its invention. A lot of firms have built their workloads on Mainframe for 5+ decades creating a lot of applications and data during this time. There are multiple options developed to store data on the mainframe for a long time. A bird’s eye view of the data options on mainframe are as below:

It predominantly consists of below flavors
1. Databases: There are various options to store data inside mainframe
  - IBM Db2: One of the most popular Relational databases and most frequently used and preferred databases on Mainframe
  - IBM IMS: Hierarchical database used to store data for quick access.
  - ADABAS: An inverted list database
  - DATACOM: An relational database
  - IDMS: Network database.
2. Files: Over the years, there are a lot of files and access methods which mainframe has added. Files are also known as datasets in the mainframe. A few of them have been highlighted below:
  - Physical Sequential (PS): These files are one of the basic files in the mainframe which store data in a sequential manner.
  - Generation Data Group (GDG): Grouping of more than one Physical sequential file under a similar name with a few features like versioning is available in these datasets/Files.
  - Virtual Storage Access method (): For faster access, this methodology is used. It has many sub access methods which are used in specific business and technological scenarios.
  - Hierarchical Files system (HFS) / z/OS File system (ZFS): This is used for Unix System Services (USS) that is hosted on the mainframe which allows Unix and java workload to run
3. Streaming storage: Streaming storage is not new to the mainframe; it has been used for a long time.
  - Messaging queues (MQ): These are used to store data which can feed batch jobs for end of day processing also as middleware for transaction processing
  - Spool storage: Inside of Mainframe, there is a facility called as spool which facilitates storing of transient data which can be used for business processing at the end of the day.
Migration of data from Mainframe: Migration of data that is collected over years from the mainframe to Azure would have to be executed with utmost care because of various factors that are involved. Few of them are:
1. Volume: Mainframe generally would have a huge volume of data usually in the range of 10’s of TBs.
2. Data Encoding: Mainframe stores all data in the EBCDIC code page. Most applications running on Azure would operate in any of the newer code pages like ASCII /UTF-8 / UTF-16. Translation of data from one code page to another must be done in a sensitive manner. Double byte data that is present in Mainframe also needs special attention.
3. Data Compression: Mainframe has a few data types, which allow compression of data into Binary, hexadecimal format, which again would need to be converted to non-mainframe usage if needed on Azure. These data types would need additional processing to make them easily accessible in a distributed environment on Azure.
4. Data Format: Mainframe allows Variable length records to be present, while migrating these records additional care would have to be taken.
5. Business requirements: Some applications cannot have downtime while migrating to Azure, in these cases, we will have to add additional layers for Change Data Capture (CDC) which will keep both Azure and mainframe in sync.

Migration Process

Generally, the migration process at a high level would involve these steps.

Application Boundary identification: Identify the right set of programs/inventory which belongs to an application. This would involve identifying unused programs, Job Control Language (JCL) Cards, etc. we would have a baselined application inventory at the end of this exercise.
Assessment: Find out which are the tables, databases, files are connected to the baselined inventory. This will give us an extensive list of data artifacts with which an application interacts with. During this phase, we can also analyze the compatibility of table Data Definition Language (DDL) to the target database chosen. SQL Server Migration Assistant(SSMA) generates reports which can help us in quickly identifying gaps between the source(Db2) and target SQL databases.
Schema migration(Tabular data): SSMA can migrate schema from source to target for compatible components, for incompatible components, there will be manual intervention that is needed.
Data Migration: This predominantly has file and table migration. Few of the file types have an equivalent cloud solution that can be moved directly. For a few file types, there would be some manual intervention that is needed for example VSAM does not have a direct one-to-one mapping on Cloud, we will have to use a slightly different approach to make VSAM work on a cloud environment. For Table migration, there would be different approaches that need to be followed are highlighted in the next section.

As mentioned earlier, there are a few considerations will respect to data movement based on business constraints. Below few of them have been highlighted.

	Low data Volume	Intermediate data Volume	High Data Volume
Long Business outage	Full Data migration	Full Data migration	Full Data migration
Intermediate Business Outage	Full Data migration	Full Data migration	Snapshot data migration, followed by Delta updates
No/Small Business outage	Snapshot data migration, followed by Delta updates	Onetime Migration followed by CDC	Onetime Migration followed by CDC

Database Migration
As there are multiple types of databases option on the mainframe, for the scope of this document, we will investigate the migration of Db2 data only. Others will have to be dealt with in their own unique way as each database poses its own considerations while migration.

Db2 Data Migration:

There are multiple tools that can help in migrating data from Mainframe to Azure. We highlight below the process for full data migration. Other migration processes will be highlighted in other papers.
File Migration:
Migration of files from Mainframe to Azure involves multiple considerations as pointed out earlier. Predominantly there are two approaches that can be used.

Find the list of files that an application uses/creates can be obtained from multiple tools which can do impact analysis. As outlined earlier, there are multiple flavors of files that need to be migrated. Below we have highlighted two approaches that can be used in migrating files from mainframe to Azure.

Host Integration Server (HIS) has a component called Host File Client (HFC), which can integrate with z/OS and help in transmitting the files from mainframe to Azure. Below are a few of the highlights from the solution.
File Migration using FTP:

Another approach to bring the mainframe data out is through the usual FTP protocol.

ADF also has an inbuild FTP connector that can be used to pull the file from Mainframe to Azure. Details of how to configure the ADF FTP connector have been highlighted in this article.

Appendix
1. SSMA for Db2
  - We can download SSMA for DB2 packages from here.
2. HIS
  - We can download HIS package from here.
3. ADF
  - ADF can also be used for a few of the steps highlighted above. One of the many use cases has been highlighted here.
References
- DFSMS Using Data Sets
Feedback and Suggestions
- If you have feedback or suggestions for improving this asset, please contact the Azure Databases SQL Customer Success Engineering Team. Thanks for your support!
  Note: For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.