How to use the Additional Sources in Map Data tool within Azure Synapse Analytics
Published Sep 30 2022 05:07 AM 1,816 Views
Microsoft

The Database Templates feature in Azure Synapse Analytics helps organizations create data models based off existing industry focused templates. After creation of the Lake Database (using a chosen database template), the Map Data tool can be used to map

attributes between source and destination. This will create data flows and pipelines in order to complete the data ingestion process.

 

When there is a need to ingest data from multiple sources into a single destination entity, these sources will need to be joined while performing the mapping. The "Additional Source" button in the Map Data tool helps achieve these joins. The documentation does not provide more details or examples on how to use the UI to create these joins which will be covered in this blog.

 

Consider the below sources and destination:

 

Source CSV files:

Sales.csv: (highlighted column will be used for ingestion)

SalesAmount

decimal

TotalProductCost

decimal

ProductStandardCost

decimal

UnitPriceDiscountPct

decimal

ExtendedAmount

decimal

UnitPrice

decimal

OrderQuantity

integer

SalesTerritoryKey

integer

ShipDateKey

integer

DueDateKey

integer

OrderDateKey

integer

ProductKey

integer

CustomerKey

integer

ResellerKey

integer

SalesOrderLineKey

integer

 

SalesOrder.csv: (highlighted column will be used for ingestion).

SalesOrderLine

string

SalesOrder

string

SalesOrderLineKey

integer

Channel

string

 

Destination Entity:

The TransactionLineItem table in the Lake Database has the following structure:

schandra_0-1664454867144.png

These above two csv files should be joined using a common column (SalesOrderLineKey) to ingest the data into TransactinonLineItem entity in the Lake Database.

 

Mapping Process:

Start with adding both the source csv files in the Map Data tool. The UI after adding the source files should resemble the below:

schandra_1-1664538882708.png

 

 

Click Continue. In the next dialog, provide a name to the mapping and click Continue.

 

In the following dialog, on the left navigation, under Target Tables, select the correct destination table. In this case – TransactionLineItem and select the Primary Source Table as Sales.

 

schandra_1-1664455541242.png

Next, click Additional Sources, and which will open a dialog similar to the below:

 

schandra_0-1664455663848.png

Notice that the left source defaults the primary source table. Select SalesOrderLineKey as the left source column.

Choose “==” as the join from the dropdown

In the right source, select SalesOrder and select the same column SalesOrderLineKey as the Right Source Column.

 

The UI should resemble the below:

schandra_0-1664455720061.png

Repeat the steps above to add more sources if there are more tables to join. Click OK to come back to the MapData tool.

 

Using the New Mapping, setup the mapping to map columns from the source to the destination. Once completed, the mapping along with preview of data should resemble the below:

 

schandra_0-1664455801114.png

Click the Create Pipeline to create the pipeline.

 

The resulting data flow will resemble the below:

schandra_1-1664456013203.png

At this stage, if everything went well, publish and run the pipeline which should execute successfully and ingest the data into the lake database.

 

Finally, running a select against the table like the below should return the results successfully

schandra_2-1664456049919.png

The CSV files used in the above example:

Data: adfdataflowdocs/AdventureWorks Data.zip at master · kromerm/adfdataflowdocs (github.com)

Schema: adfdataflowdocs/AdventureWorksSchemas.xlsx at master · kromerm/adfdataflowdocs (github.com)

Version history
Last update:
‎Sep 30 2022 05:07 AM
Updated by: