Blog Post

Azure AI Foundry Blog
2 MIN READ

Azure Machine learning Service Designer - Data Engineering

BalaB's avatar
BalaB
Icon for Microsoft rankMicrosoft
Dec 14, 2021

How can we do Data engineering in Azure Machine Learning Service using designer

Prerequisite

  • Azure Account
  • Azure Storage
  • Azure Machine learning Service

Introduction

  • This tutorial is only to show how to do data engineering in Azure Machine Learning Service using designer.
  • Data used is Titanic dataset. which is a famous dataset in Machine Learning.
  • Open source dataset is used here.
  • Every task or flow item has parameters and output
  • After run every task output can be visualized
  • Output will change based on the task or flow item

Overall flow

 

 

  • Above is the overall experiment
  • Build using low code environment
  • All are drag and drop

What's done

Bring the dataset

Select columns in dataset

 

 

Execute python script - Correlation Chart

    import seaborn as sn
    import matplotlib.pyplot as plt

    corrMatrix = dataframe1.corr()
    print (corrMatrix)
    sn.heatmap(corrMatrix, annot=True)
    plt.show()
    img_file = "corrchart1.png"
    plt.savefig(img_file)

    from azureml.core import Run
    run = Run.get_context(allow_offline=True)
    run.upload_file(f"graphics/{img_file}", img_file)
 

 

  • Output

 

Execute python script - Covariance Chart

    covMatrix = dataframe1.cov()
    print (covMatrix)
    sn.heatmap(covMatrix, annot=True)
    plt.show()
    img_file = "covchart1.png"
    plt.savefig(img_file)

    from azureml.core import Run
    run = Run.get_context(allow_offline=True)
    run.upload_file(f"graphics/{img_file}", img_file)
  • Code

 

  • Output
 

 


 

Remove duplicate rows

 

Normalize data

 

Group data in bins

 

Edit Metadata to convert String to Categorical column - Name

 

Edit Metadata to convert String to Categorical column - Cabin

 

Edit Metadata to convert String to Categorical column - Embarked

 

Clip value - Avoid overfitting

 

Clean missing data

 

Apply math operations

 

Split data into training and test data

 

bring model to train

 

Train model

 

Score model

 

  • Output

 

Evaluate Model

  • output

 

  • Roc Curve

 

  • Confusion Matrix

 

This article is to show how to do data engineering in Azure machine learning designer only. Model is not accurate and open source data set is is used here.

 

original article - Samples2021/designerdataengg.md at main · balakreshnan/Samples2021 (github.com)

Published Dec 14, 2021
Version 1.0
No CommentsBe the first to comment