Azure ETL
36 TopicsForEach Activity: Immediate Pipeline Failure on Any Child Activity Failure
Hello Azure community, I’m working with azure data factory and have a pipeline set up with a ForEach activity that runs three activities in parallel: 1. Notebook A 2. Execute Pipeline B 3. Notebook C My requirements is to ensure that if any one of these activities fails (e.g. Child Activity A or B or C fails after 2mints of pipeline start) the entire pipeline should fail immediately, regardless of the status of other activities (still running). Could you please guide me on how to achieve this behaviour? Thank you for assistance!96Views0likes0Comments'Cannot connect to SQL Database' error - please help
Hi, Our organisation is new to Azure Data Factory (ADF) and we're facing an intermittent error with our first Pipeline. Being intermittent adds that little bit more complexity to resolving the error. The Pipeline has two activities: 1) Script activity which deletes the contents of the target Azure SQL Server database table that is located within our Azure cloud instance. 2) Copy data activity which simply copies the entire contents from the external (outside of our domain) third-party source SQL View and loads it to our target Azure SQL Server database table. With the source being external to our domain, we have used a Self-Hosted Integration Runtime. The Pipeline executes once per 24 hours at 3am each morning. I have been informed that this timing shouldn't affect/or by affected by any other Azure processes we have. For the first nine days of Pipeline executions, the Pipeline successfully completed its executions. Then for the next nine days it only completed successfully four times. Now it seems to fail every other time. It's the same error message that is received on each failure - the received error message is below (I've replaced our sensitive internal names with Xs). Operation on target scr__Delete stg__XXXXXXXXXX contents failed: Failed to execute script. Exception: ''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot connect to SQL Database. Please contact SQL server team for further support. Server: 'XX-azure-sql-server.database.windows.net', Database: 'XX_XXXXXXXXXX_XXXXXXXXXX', User: ''. Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access.,Source=Microsoft.DataTransfer.Connectors.MSSQL,''Type=Microsoft.Data.SqlClient.SqlException,Message=Server provided routing information, but timeout already expired.,Source=Framework Microsoft SqlClient Data Provider,'' To me, if this Pipeline was incorrectly configured then the Pipeline would never have successfully completed, not once. With it being intermittent, but becoming more frequent, suggests it's being caused by something other than its configuration, but I could be wrong - hence requesting help from you. Please can someone advise on what is causing the error and what I can do to verify/resolve the error? Thanks.789Views0likes2CommentsCan an ADF Pipeline trigger upon source table update?
Hi, Is it possible for an Azure Data Factory Pipeline to be triggered each time the source table changes? Let's say I have a 'copy data' activity in a pipeline. The activity copies data from TableA to TableB. Can the pipeline be configured to execute whenever source TableA is updated (a record deleted, changed, a new record inserted, etc..)? Thanks.229Views0likes0CommentsWorkflow orchestration manager dependencies between multiple Data factory accounts
Hello All, I am looking into a use case where 1. I have a workflow orchestration manager in Data factory at subscription A 2. Another workflow orchestration manager in data factory in subscription B Both data factories are available in the same region east US i created a pipeline(airflow) in Subscription A but should also make pipeline created in B as a dependency. In other words pipeline A is dependant on B. Is this possible with workflow orchestration manager in ADF or is this use case supported. Also if i have to accomplish the above task with data factories in different regions(one in US and other in East Asia) is this also possible? Thank you in advance320Views0likes0CommentsSome questions on ADF and Azure SQL Server
Hi, My company is looking to implement a data integration method. The project has been assigned to me but I'm not a data engineer and so I would like your guidance on the recommendation. I have the need to ingest several (only twelve at present) 3rd-party data sources into our domain so the data can be reported on. These external data sources are simple RDMS (most likely all to be MS SQL Server) and the volume of data, due to the 3rd-party creating a View for me, is only going to be around 20 columns and 20,000 rows, per data source. It's all structured data. My intention is to use Azure Data Factory (ADF) as the integration tool. The reason for this is we are entirely MS cloud-based and I see the ADF as the most suitable (simple, robust, cheap) MS cloud-based integration tool available - although you may inform me otherwise. I need to decide on the storage to hold the external data. I've had very brief experience with Synapse Serverless Pool, as it was the recommended substitute for Data Export Services (DES) (we use Dynamics 365 as our transactional system), which I found limiting in the SQL commands compatibility. Many of the SQL Views I had wrote upon DES weren't compatible in Synapse - I guess due to Synapse being written in Spark. For this reason, I am reluctant to use Synapse as the data storage. It is for this same reason I am reluctant to use the ADF Storage Account as I believe it is too written in Spark. Please can you advise on the below questions: 1) Is the ADF Storage Account written in Spark and thus prone to the same incompatibility as Synapse Serverless Pool is? 2) What are the benefits to using the ADF Storage Account over Azure SQL Server, and visa versa? 3) I know this question configuration specific but I'll ask anyway. Which is cheaper based on our basic use-case - ADF Storage Account or Azure SQL Server? I have trouble understanding the online pricing calculators. 4) I understand to execute activities/pipelines between Azure storage sources (ADF Storage Account, Azure SQL Server, etc.. Azure products) a 'Azure integration runtime' is needed. I also understand to extract data from an On-Premise SQL Server database a 'Self-Hosted integration runtime' is required - is this correct, and where will this 'Self-Hosted integration runtime' need to be installed (on the box that is running On-Premise SQL Server?)? I think that's all my questions for now. Thanks for your help.318Views0likes0CommentsRun batch job on remote VM from Azure Data Factory
Hello, I am new to Azure and work in an environment that is mixed Azure and on-prem. I do not have access to alter firewall rules. I do have Managed virtual networks set up for at least 1 Azure item. I have an on prem REST application. I have an Azure VM that is able to access REST on its vnet. I have an Azure ADF instance on a vnet that can contact the VM but not the REST application. I want to pipe ADF through the VM to obtain the data. I don't even know if this is a thing. Anyone with experience on this can outline what I should be trying to do? thank you!230Views0likes0CommentsHow to handle azure data factory lookup activity with more than 5000 records
Hello Experts, The DataFlow Activity successfully copies data from an Azure Blob Storage .csv file to Dataverse Table Storage. However, an error occurs when performing a Lookup on the Dataverse due to excessive data. This issue is in line with the documentation, which states that the Lookup activity has a limit of 5,000 rows and a maximum size of 4 MB. Also, there is a Workaround mentioned (Micrsofot Documentation): Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size. How can I do this? Is there a way to define an offset (e.g. only read 1000 rows) Thanks, -Sri2.6KViews0likes1CommentHow to load data from On-prem to Snowflake using ADF in better way
Hi, My use case is as follows: Our data source is an On-prem SQL Server, and it serves as our production database. Currently, we are building reports in Power BI and utilizing Snowflake as our data warehouse. I aim to extract 10 to 15 tables for my Power BI reporting into Snowflake, specifically wanting to construct a SCD Type 1 Pipeline, without the need to retain historical data. To facilitate the data transfer from On-Prem to Snowflake, we are leveraging Azure Data Factory, with blob storage set up in Azure. We already have a Self-hosted runtime in place that connects to Data Factory. Currently, I've employed the For Each loop activity to copy 15 tables from On-prem to Snowflake. The pipeline is scheduled to run daily, and each time it executes, it truncates all 15 tables before loading the data. However, this process is time-consuming due to the volume of data, especially since our On-prem SQL server is a legacy database. My questions are as follows: Is there a more efficient approach within Data Factory for this task? What are the best practices recommended for this type of case study? Dataflow is not functioning with the Self-hosted runtime; how can I activate Dataflow for an On-prem database? I would greatly appreciate any advice on the correct solution for this or pointers to relevant documentation or blogs.1.7KViews0likes0Comments