azure data factory
8 TopicsWatermark column in Metadata driven pipeline in ingest tool
Hey, There are few questions related to metadata driven pipeline of ingest data tool in ADF: 1. Can we choose any other column as watermark column instead of created/last_updated/modified date column? 2. Can we choose primary key as a watermark column, if yes, then what should be the watermark value?940Views0likes0CommentsData encryption
Hi Team, I saw there is an option to encrypt data in transit and rest for Azure Service Bus. Is it possible to used two different certificates /keys ( my certificate, not Azure) to encrypt data at rest and transit? I mean one certificate for encrypting Data at rest and another certificate for encrypting data in transit for following services? Azure Service Bus Azure Logic Apps Azure API Management Azure Data factory829Views0likes0CommentsBuild Mapping Data Flows using Managed Identity (formerly MSI) for Azure SQL DB and Azure Synapse
Hi All, I thought you might like to read my new article: Build Mapping Data Flows using Managed Identity (formerly MSI) for Azure SQL Database and Azure Synapse Analytics (formerly SQL DW) https://www.linkedin.com/pulse/build-mapping-data-flows-using-managed-identity-formerly-angane Mark Kromer Based on your Tech Community blog I have created this article. Happy Learning!1KViews0likes0CommentsManual Backup Cosmos DB
Hi, Tried to export data in CosmosDB but it was not successful. According to https://docs.microsoft.com/en-us/azure/cosmos-db/storage-explorer, by using this tool I can export the data inside the cosmosdb, but no option to export. Tried to do the instructions here https://azure.microsoft.com/en-us/updates/documentdb-data-migration-tool/ and https://docs.microsoft.com/en-us/azure/cosmos-db/import-data#JSON, but error is being encountered. Can you help me how to do this in Data Factory or any steps just to manual backup cosmos DB? Thank you.2.4KViews0likes0CommentsAzure Data Factory - HANA ( Table Browse Issue )
Let me know if it ring any bells for the below issue – Azure & HANA connectivity. I have setup a Integration Runtime (along with HANA client ) in my Desktop in order to form the connection between HANA and Azure ( ADF ). The connection validates all fine. However when I browse the tables from the Azure ADF connection if only gives me handful of tables under “SYS” schema from HANA. Can you guess a reason ?. ( user has no privilege issues )Solved940Views0likes1CommentHow to use values from Data Source as parameter to other Data Source
I have two linked services: 1. A SQL database 2. A Rest API. I'm retrieving data from SQL to my Staging area incrementally. This is no problem. However, for retrieving information from the Rest API I need values from the records I just moved to the staging area. So let's say I get product data from SQL and the Rest API gives me additional info for these products from an external system. So I need to call the Rest API multiple times with as parameter each of the product ID's I've just imported. What is the most efficient way to this in ADF?. So in the way ADF was intended to be used. Thanks for any tips.667Views0likes0CommentsAzure Data Factory - Complex Java based ETL to Codeless Pipeline
Azure Data factory (ADF) is a fully managed data integration tool that helps to build, manage and orchestrate complex jobs. The brand new UX experience in ADF V2 is so intuitive & effortless in creating ADF’s pipelines, activities and other constructs. Anyone can easily pick up this tool and be fully productive in few days. From a feature set perspective, it has built-in connectors to 65+ data stores, works very well in a hybrid environment and has control flow elements to design even most complex ETL workflows. More information about ADF can be found in the link. I wanted to call out the journey one of the customers had gone through with ADF adoption and how it modernized their ETL workflow and the best part- without writing a single line of code! Their legacy data integration job was coded in Java, spanning more than 1000+ lines and scheduled as batch job in ETL virtual machine. Java code had many modules (like RestAPI calls, storing data to a Database, looping/lookup on, retry mechanism & some basic level of logging. Team wanted to have a faster deployment with minimalistic code changes to their codebase. Even then I was confident ADF can easily find its own way to glory once tried. With almost no prior experience, customer was able to use Copy Data wizard to call RestAPI, transform data & and store to database. This was equivalent to hundreds of lines of code in java. And thereafter, customer was really impressed with the richness ADF provides in user experience and its powerful integration features. In the end, entire Java code was effortlessly translated to a single ADF pipeline with zero coding and triggers on a pre-defined schedule & on new blob creation event. As an added advantage, ETL server hosting Java code were discarded, helping customer to reduce the cost. And special credits to Abhishek Narain – Program Manager for Azure Data Factory, who pitched in every time whenever we had small issues with ADF’s activities (there is no better place than Microsoft when it comes to collaborative effort ).There were some good learnings & tricks that we learned, which might be helpful for other folks such as: 1) Mapping Event Trigger variables to Copy activity: For those who do not know, Event driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption and reaction to events. I had a little difficulty while trying to connect all pieces together for mapping event trigger variables for blob creation. Abhishek had explained what needs to be done and we thought it will be helpful for others also to know these steps. Below is a snippet of ADF pipeline that gets triggered whenever a blob is uploaded to the container which is continuously polled by ADF. At first, let us start with creating a few parameters for the Pipeline itself. For a simple analogy, I would compare these to “global variables” in your programing code, which have global scope, i.e. available to all functions inside it. And functions can be considered to be ADF’s activities. In the above parameters, most important ones are “pipeline_SourceFolderName” and “pipeline_SourceFileName” , which will be populated by ADF’s event triggers ( which is explained below). Now let's go define input data set. In my use-cases, json file uploaded into a container is defined as input dataset. Trigger will be polling above container for new blob creation and subsequently triggers the ADF pipeline. Under your input data set, you need to define 2 more parameters. These are only in the scope of this dataset and are not visible outside the scope. These are not the same as pipeline parameters which were defined above ( I have named it differently to avoid confusion). The above 2 dataset parameters are then used to mention the File path in the connection tab of the dataset as shown below. These expressions can be picked by using a dynamic content pop-up. Now let's go to the Pipeline Canvas and drag a Copy Activity. In this example I have named it “CopyDataToSQLDWH” since blobs (json file) are parsed and ingested into SQL DWH. Configuration of source & sink are covered well in Copy Wizard tutorial, so I am not repeating it.However, the key piece here is to link input dataset’s parameters with pipeline’s parameters under the Source tab as show below Just to recap, what has been mapped: Pipeline parameter for Source Folder Name and File Name (basically the container name & blob name) Defined Input Dataset’s parameter and mapped file path in its connection tab In Pipeline Canvas, in Copy Activity’s Source tab, mapped input dataset’s parameters with pipeline’s parameters At this stage, you have mapped parameters between pipeline & activities. And as the final stage, we will have to map pipeline parameters to trigger variables. Click on the ‘Trigger’ button to create an Event Trigger and specify the blob name & folder path to poll for new blob creation. And in the next window, the final step is to map Pipeline’s parameters to dynamic value generated by trigger which are @triggerBody().folderPath : Returns the container path on which trigger is polling for new files @triggerBody().fileName : Returns the file name of the blob which is picked for processing. And that’s it. After successfully publishing ADF, whenever a .json file (I have given .json as file ending path) is uploaded , ADF is triggered automatically. It copies the JSON files to SQL DWH and if the copy activity is successful, then sends email by posting to a REST API; else sends error message. 2) Power of Parameter & Expression: Expressions can appear anywhere in a JSON string value and always result in another JSON value. If a JSON value is an expression, the body of the expression is extracted by removing the at-sign (@). It gives so much freedom to make ADF pipeline modularized. It helps you to build dynamic content, minimize repetitions, minimal changes in code deployment, conditional execution and the list goes on. I have put some examples below: Creating dynamic message content with runtime details { "message": "@{activity('CopyDataToSQLDWH').output.dataWritten}", "dataFactoryName": "@{concat(pipeline().DataFactory,'- Success')}", "pipelineName": "@{pipeline().Pipeline}", "pipelineRunId": "@{pipeline().RunId}", "receiver": "@pipeline().parameters.receiver" } { "message": "@{activity('CopyDataToSQLDWH').error.message}", "dataFactoryName": "@{concat(pipeline().DataFactory,'- Error')}", "pipelineName": "@{pipeline().Pipeline}", "pipelineRunId": "@{pipeline().RunId}", "receiver": "@pipeline().parameters.receiver" } Parameterizing Key Vault to a Blob LinkedService { "name": "ArchiveBlob", "properties": { "linkedServiceName": { "referenceName": "AzureBlobStorage", "type": "LinkedServiceReference", "parameters":{ "key_vault_for_archive_blob": { "value": "@dataset().key_vault_for_archive_blob", "type" : "Expression" }}},}} Deducing new blob filename by concatenating Pipeline Runtime Id to blob name @concat(pipeline().parameters.pipeline_SourceFileName,'-',pipeline().RunId)6KViews0likes0CommentsFailure during copy from blob to sql db using ADF
Hello, I get this error when using Azure Data Factory for copying from blob to azure SQL DB:- Database operation failed. Error message from database execution : ExecuteNonQuery requires an open and available Connection. The connection's current state is closed.. I am able to connect to my DB using SSMS and I have enabled firewall to connect to Azure services. Has anyone faced this error when using ADF / know how to fix it? Thanks in advance.4.4KViews0likes4Comments