Convert and unpack Mainframe/Midrange File System data from EBCDIC to ASCII Using Azure Data Factory
Published Sep 20 2022 09:35 AM 11.1K Views
Microsoft

Introduction

There are multiple scenarios where file data needs to be transferred from Mainframe and Midrange Systems to Azure. Few of them could be

  • Initial Data Load or One time data Sync
  • Periodic delta updates (updates which happened since the last sync).
  • Downstream and Upstream data movement/dependency.
  • Hybrid application architecture (Part of application on Mainframe, part on Azure).

Couple of major hurdles in these architectures are movement of data from one system to another and second major hurdle would be codepage and datatype conversion. As Mainframe/Midrange works on a different codepage and has many custom datatypes like computational fields, which are not straight forward for conversion, Microsoft first party tool, Data Provider for Host Files from Host Integration Server (HIS) provides an easy way to unpack data types and convert from Mainframe EBCDIC Codepage to ASCII.

 

What's new?


Along with HIS, we now have a new custom Azure Data Factory (ADF) connector which can help customers to convert Mainframe data on Azure. As ADF is a low code, no code solution, there would not be much development needed from the customer to achieve the conversion on Azure. This blog talks about how we can setup and execute the new custom ADF connector.

 

The new solution to this would take a file in Blob as input along with the copybook file (Metadata file). With the help of Metadata file, ADF custom connector can convert the data from Mainframe format to ASCII format.

 

High Level architecture of the solution would look as below:

ADF Connector HLL New.png

 

Figure-1: High Level architecture

  • Data is transferred to Azure Blob from mainframe and stored in a container.
  • An SHIR machine is setup with the custom connector code.
  • ADF pipeline will read data from Source blob, do necessary computation on the SHIR
  • Store the converted data into sink blob

 ADF Connector Architecture Diagrams.png

 

 

Figure-2: Detailed Architecture

  1. Mainframe binary file is given as input to the ADF connector along with the Copybook file.
  2. When the pipeline is run, ADF connector first starts communication with the virtual machine where SHIR has been hosted.
  3. Control is passed to SHIR from ADF.
  4. The custom code for the Mainframe File Data Mapper Connector is present in the SHIR.
  5. Initially it will download the copybook from Blob storage. It will run the HIS Powershell script to convert the copybook into an equivalent DLL file.
  6. A chunk of data is read from input mainframe binary file and using HFC component of HIS product, converted to ASCII.
  7. Converted data is written to output Blob. Process 6 and 7 are repeated until end of input file.

 

Pre-requisite:
   • Transfer Mainframe data in binary format from Mainframe to Azure Blob Storage container.
   • Corresponding Copybook (Metadata file) in the same folder as the data file.
   • Windows Virtual Machine with SHIR installed on it
   • HIS product installed on the SHIR machine.
   • ADF configured to corresponding SHIR machine.

 

Setup:

Setup involves two stages.
• Stage1: Setup the custom connector on the SHIR machine.
• Stage2: Setup a pipeline which uses the newly setup custom Connector.

 

Setup Custom Connector:

• Setup SHIR on the VM using one of the many methods highlighted in this document.
• Search for “Microsoft Integration Runtime” and stop the server.
• Please write to datasqlninja@microsoft.com to get the package.
• Create a new folder named “CustomPlugins” under the folder ‘C:\Program Files\Microsoft Integration Runtime\5.0’
• Create a new folder named “Test” under the folder ‘C:\Program Files\Microsoft Integration Runtime\5.0\CustomPlugins’
• Place a json file named “CustomPlugins.json” provided in the package into the test folder.
• Create a folder named “EBCDICtoASCII/1.0.0.0” under the test folder.
• Copy contents of package provided into this folder. Effectively all the contents should be copied into folder “C:\Program Files\Microsoft Integration Runtime\5.0\CustomPlugins\Test\EBCDICtoASCII\1.0.0.0”
• Start the “Microsoft Integration Runtime” software which was stopped initially.

 

Setup ADF Pipeline:

  1. First create an integration runtime with the SHIR which you just setup earlier. Process to create an SHIR is highlighted here. Login to Azure portal using the URL https://adf.azure.com/?feature.enableDynamicPluginLayout=true&feature.shirForDynamicPluginDebug=<integrationruntimename> Where the last parameter is the integration runtime name. Please note, this ADF should be connected to the SHIR which you had setup earlier. Fill in the details of your ADF.screenshot1.png
  2.  Create a new Pipeline in the ADF and give it a new name. createpipeline.pngnamePipeline.png

     

     

  3.  Now drag and drop a new Copy Activity in the pipeline drag copy activity.png

  4. Click on Source and select New source,new.png

     

  5.  Search for "Mainframe File Converter", Click on the connector Mainframe file converter.png

     

  6. Select type Binary

    binary.png

  7. Give an appropriate name to Dataset Property and then click on NewLinked service.png

     

  8.  Give the linked service an appropriate name, choose the integration runtime, then choose the right connection string that has been put into Azure key vault. In the key value you will need to store your blob SAS string so that ADF will have access to the Blob Container. Read more about how to create a SAS Key here.  Here "CustomConnector" is the name of integration runtime. You might have given a different name for the same. Please choose the integration runtime which points to SHIR which was configured earlier.New linked service.png

  9. Enter the input container, folder and file name in the given space This is where the binary version of Input data file is present.input blob.png

  10.  

     Enter the copybook name of the data file which was provided earlier. Also make Max concurrent connection as 1.copybook.png

  11.  Move to the Sink tab and click on newsink.png

  12.  Search for Blob and click on Blob Storage and click continue sink selection.png

  13.  Select type Binary, then give details of Azure Integration runtime, along with details of where the output needs to be stored sink blob.png

  14.  The pipeline is now ready to be execute. Just click on Debug to start running the pipeline. Once it is completed, it will give all the necessary stats in ADF UIdetails.png

     

    finalpage.png

  15.  You should now see the desired output in sink configuration with comma separated fields.output.png

 

Limitations

  • Connector is unable to work with OCCURS DEPENDING ON clause.
  • Connector is unable to work with REDEFINES clause.
  • Azure IR cannot be used for this connector as of this point (Sept 2022).

Feedback and suggestions

If you have feedback or suggestions for improving this asset, please contact the Azure Databases SQL Customer Success Engineering Team. Thanks for your support! Note: For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.

5 Comments
Co-Authors
Version history
Last update:
‎Feb 01 2023 03:14 AM
Updated by: