The Database Templates feature in Azure Synapse Analytics helps organizations create data models based off existing industry focused templates. After creation of the Lake Database (using a chosen database template), the Map Data tool can be used to map
attributes between source and destination. This will create data flows and pipelines in order to complete the data ingestion process.
When there is a need to ingest data from multiple sources into a single destination entity, these sources will need to be joined while performing the mapping. The "Additional Source" button in the Map Data tool helps achieve these joins. The documentation does not provide more details or examples on how to use the UI to create these joins which will be covered in this blog.
Consider the below sources and destination:
Source CSV files:
Sales.csv: (highlighted column will be used for ingestion)
SalesAmount |
decimal |
TotalProductCost |
decimal |
ProductStandardCost |
decimal |
UnitPriceDiscountPct |
decimal |
ExtendedAmount |
decimal |
UnitPrice |
decimal |
OrderQuantity |
integer |
SalesTerritoryKey |
integer |
ShipDateKey |
integer |
DueDateKey |
integer |
OrderDateKey |
integer |
ProductKey |
integer |
CustomerKey |
integer |
ResellerKey |
integer |
SalesOrderLineKey |
integer |
SalesOrder.csv: (highlighted column will be used for ingestion).
SalesOrderLine |
string |
SalesOrder |
string |
SalesOrderLineKey |
integer |
Channel |
string |
Destination Entity:
The TransactionLineItem table in the Lake Database has the following structure:
These above two csv files should be joined using a common column (SalesOrderLineKey) to ingest the data into TransactinonLineItem entity in the Lake Database.
Start with adding both the source csv files in the Map Data tool. The UI after adding the source files should resemble the below:
Click Continue. In the next dialog, provide a name to the mapping and click Continue.
In the following dialog, on the left navigation, under Target Tables, select the correct destination table. In this case – TransactionLineItem and select the Primary Source Table as Sales.
Next, click Additional Sources, and which will open a dialog similar to the below:
Notice that the left source defaults the primary source table. Select SalesOrderLineKey as the left source column.
Choose “==” as the join from the dropdown
In the right source, select SalesOrder and select the same column SalesOrderLineKey as the Right Source Column.
The UI should resemble the below:
Repeat the steps above to add more sources if there are more tables to join. Click OK to come back to the MapData tool.
Using the New Mapping, setup the mapping to map columns from the source to the destination. Once completed, the mapping along with preview of data should resemble the below:
Click the Create Pipeline to create the pipeline.
The resulting data flow will resemble the below:
At this stage, if everything went well, publish and run the pipeline which should execute successfully and ingest the data into the lake database.
Finally, running a select against the table like the below should return the results successfully
The CSV files used in the above example:
Data: adfdataflowdocs/AdventureWorks Data.zip at master · kromerm/adfdataflowdocs (github.com)
Schema: adfdataflowdocs/AdventureWorksSchemas.xlsx at master · kromerm/adfdataflowdocs (github.com)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.