Introducing the 'Data Integration in a box' solution

Published Jan 20 2022 11:08 PM 3,837 Views
Microsoft

We are happy to introduce a ‘Data Integration in a box’ solution to help you get started with real-world, scenario-driven data pipelines in just minutes. It does most of the heavy lifting required by creating different Azure resources deploying relevant Data Pipelines and data source connections so that you can quickly get the real-world data pipelines running in just minutes. 

 

Please refer to our GitHub page to get started with ‘Data Integration in a box’.

 

What’s in the solution

The Data Integration in a box solution extracts sales lead and activity from Dynamics 365 into Azure Synapse Analytics and Azure Cosmos DB using the Azure Data Factory pipeline. It also anonymizes/ masks sensitive data using Presidio APIs as part of the transformation task. It then uses visual Azure Data Factory Data flows to join the two entities and filters activities generated with leads from ones without leads for further analysis. Finally, it writes the two streams into standardized Common Data Model (CDM) format into the data lake for further consumption.  

 

Scenario – Monitor and analyze your Dynamics 365 sales leads and activities using Azure Data Factory and Azure Synapse analytics in just a few clicks.

 

  • Azure Synapse analytics is a limitless analytics service that combines data integration, enterprise data warehousing, and big data analytics.
  • Azure Data Factory is Azure’s cloud ETL service for scale-out serverless data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management.
  • Azure Cosmos DB is the operational database used during ETL as the source.
  • Azure Data Lake Gen2 is the data lake for staging and storing processed data.
  • Presidio is a data protection and anonymization library (an open-source project). It allows organizations to preserve privacy more simply by democratizing de-identification technologies and introducing transparency in decisions. It facilitates fully automated and semi-automated PII de-identification flows across multiple platforms.

 

AbhishekNarain_0-1642723396997.png

 

Note: The picture depicts the Azure resources, data flow, and pipelines that are auto-generated as part of the solution.

 

 

This ‘Data integration in a box’ solution uses several essential tables for sales and user activities tracking from Dynamics 365 to gauge their sales pipeline health. It generates the respective tables, implies the schema, and creates separate tables with cleansed data in the data warehouse.

The solution uses two different data sources:

  1. Dynamics 365 (activitypointer entity [schema], lead entity [schema])
  2. NYC Taxi data from Microsoft Open Datasets

 

 

You may modify the data flows, pipelines, triggers, update source/ destinations as appropriate. With Data Integration in a box, you can now jumpstart your learning experience with Azure Data Factory and Azure Synapse Analytics by having an easy-to-use environment to try out the data integration capabilities on Azure.

 

Please refer to our GitHub page to get started with ‘Data Integration in a box’.

 

 

 

Co-Authors
Version history
Last update:
‎Jan 20 2022 11:08 PM
Updated by: