database platform cse – mainframe
14 TopicsIngesting Mainframe File System Data (EBCDIC) into SQL DB on Fabric Using OSS Cobrix
Introduction Mainframe/Midrange data is often stored in fixed-length format, where each record has a predetermined length, or variable-length format, where each record’s length may vary. The data is stored in binary format, using Extended Binary Coded Decimal Interchange Code (EBCDIC) encoding and the metadata for the EBCDIC files is stored in a copybook file. These EBCDIC encoded files store data uniquely based on its data type, which is vital Mainframe file system data optimal storage and performance. However, this presents a challenge when migrating data from Mainframe or Midrange systems to distributed systems. The data, originally stored in a format specific to Mainframe or Midrange systems, is not directly readable upon transfer to distributed systems. As distributed systems only understand code pages like American Standard Code for Information Interchange (ASCII) To make this data readable on a distributed system, we would need to do an EBCDIC to ASCII code page conversion. This conversion can be achieved in many ways. Few of them are Microsoft Host Integration Server, Host File client Logic app IBM host File connector. Our detailed blog about it is here. Open Source (OSS) Libraries. Third-party ISV solutions. Microsoft Host Intergration server Microsoft Host Integration server (HIS) has a component named Host File Client (HFC). This particular component helps in converting Mainframe EBCDIC files to ASCII using a custom developed C# solution. More details on this solution is provided in HIS documentation page. Logic App Converter. If you prefer to choose a cloud native solution, then you can try to use the Host File Connector in Azure Logic Apps. The detailed process has been documented in this blog post. Fabric (with Open-Source Libraries) Microsoft Fabric is an enterprise-ready, end-to-end analytics platform. It unifies data movement, data processing, ingestion, transformation, real-time event routing, and report building. It supports these capabilities with integrated services like Data Engineering, Data Factory, Data Science, Real-Time Intelligence, Data Warehouse, and Databases. There are many open-source solutions which can help in achieving conversion of mainframe data to ASCII. This will help in converting files using Fabric, Databricks, Synapse on Azure. This blog will focus on the OSS option. Data Ingestion Architecture Using OSS on Fabric There are multiple Open-source libraries that can be utilized for this data conversion. In this article we will dive deeper into one of these solutions - Cobrix COBRIX is an open-source library built using scala and leverages the multithreaded process powered framework of spark. This helps in converting the file faster than compared to other single threaded processes. As this is multithreaded, it will need a pool of compute resources to achieve the conversion. Cobrix can run on spark environments like Azure Synapse, Databricks and Microsoft Fabric. We will dive deeper into how we can set up Cobrix on Microsoft Fabric. Download required Cobrix packages We will have to first download the required Cobrix packages from the right sources. As Fabric has a particular runtime dependency, please make sure your download the right build for Scala as per the fabric environment that you setup. You will have to download two jars named Cobol-Parser_xx.xx.jar and Spark-cobol_xxx.xx.jar. Setup the Fabric Environment. Login to fabric.microsoft.com. Create a Fabric workspace Create an Environment in the workspace Open the Environment and click on custom Libraries. Upload the two jars which were downloaded earlier.Once you have uploaded your custom library setup should look something like this. Create a new Lakehouse. Upload the cobol copybook file as well as the Mainframe Datafile in Binary to a particular location in the lakehouse. At the end of this step your lakehouse setup should look something of this kind. For both these files, copy the Azure Blob File System Secure (ABFSS) path by right clicking on the files. This link can be used to point to the file from the Spark notebook. Create a new Fabric pipeline. This pipeline will have two components, the first component will be a notebook, which will call the Cobrix framework to convert the file from EBCDIC to ASCII. Second piece of it will be a copy activity to copy the contents of the output file created in the notebook to a SQL DB on Fabric. Create a new Notebook . Attach the environment which you had created earlier to this notebook. In the notebook cell, use can use this piece of code. //Blob access var CopyBookName = "abfss://file1.cpy" var DataFileName = "abfss://file1.dat" var outputFileName = "abfss://output.txt" //Cobrix Converter Execution val cobolDataframe = spark .read .format("za.co.absa.cobrix.spark.cobol.source") .option("copybook", CopyBookName) .load(DataFileName) //Display DataFrame to view conversion results cobolDataframe.printSchema() cobolDataframe.show() Once you have set the configuration properly, you are all set to run the notebook. And this will convert the file from EBCDIC to ASCII and store it to the Lakehouse. Add a Copy activity to the pipeline with File as Source and SQL server as destination. At this point in time, your pipeline should look something like this Once you run this pipeline, the Mainframe EBCDIC file will be converted to ASCII and then loaded into Fabric Native SQL DB table. Third-party ISV solutions. There are many third-party ISV solutions which are available for EBCDIC to ASCII conversions. Please get int touch with us to help you get the right solution for your requirements. Summary EBCDIC to ASCII conversion is a critical piece of work during the data migration/modernization journey. Being able to do this with ease and accuracy will drive the success of data migration. With this feature enabled in fabric, this opens up a new set of use cases like Mainframe report generation etc kind of use cases which are predominantly data warehouse driven. Feedback and suggestions If you have feedback or suggestions for improving this data migration asset, please send an email to Database Platform Engineering Team.Resolving ADF Copy Upsert Failures in SQL DB for XML Columns
Customers modernizing their Oracle, Mainframe/Midrange Db2, Sybase and other databases to Azure SQL often use Azure Data Factory (ADF) for data transfer. For large datasets, an initial copy is made a few days prior, followed by incremental data copies during cutover. The ADF Copy activity's Upsert method is common for these incremental transfers, but this method fails for tables which contain XML columns. This blog explains using Stored Procedure as the Sink behavior in Copy activity to perform Upsert into Azure SQL Database. Same method can be used even when target is Azure SQL MI or Azure SQL VM.Migrating Mainframe RACF user identities to Microsoft Entra ID Users
RACF[Resource Access Control Facility] is one of the main security components which does authorization and authentication on the Mainframe. While modernizing mainframe workloads to Azure, there is a need to migrate the identity information of the users to Azure. This also applies in co-existence and hybrid scenarios where both mainframe and Azure user information needs to stay in sync.Database Schema Compare Tool
Migration to Azure SQL is far easier than before. Availability of Schema migration, Database migration, Database comparison, Schema comparison tools from Microsoft makes migration easier and risk free. This blog is about how the Database Schema comparison tool can be used to perform Db2 and Azure SQL Schema comparison.