Azure ETL
46 TopicsAuto Create SQL table from Imported CSV in ADF?
Hi All Wondering if its possible to automatically create a sql table from an imported CSV in Data Factory? To make things a little more complicated not all csv's will have the same headings and differing number of headings. If so, can anyone point me in the direction of a how to guide? Case I've been ingesting csv's using a HTTP connector in ADF then storing CSV data into a SQL table (manually created) and then transforming and cleaning said data into a Datastore SQL table that is also manually created. I know I'm a little slow to the party but I've been looking at using parameters and was wondering if I pulled the csv's into blob storage and then transformed from there. Then I'd only need to create one SQL table and if I could automate the entire process in ADF that would save future me a lot of time. I have another issue but I'll post that separate as its a slightly different topic. Thanks for Reading Hope someone can point me in the right direction. JohnSolved6KViews1like1CommentHow to handle azure data factory lookup activity with more than 5000 records
Hello Experts, The DataFlow Activity successfully copies data from an Azure Blob Storage .csv file to Dataverse Table Storage. However, an error occurs when performing a Lookup on the Dataverse due to excessive data. This issue is in line with the documentation, which states that the Lookup activity has a limit of 5,000 rows and a maximum size of 4 MB. Also, there is a Workaround mentioned (Micrsofot Documentation): Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size. How can I do this? Is there a way to define an offset (e.g. only read 1000 rows) Thanks, -Sri3.5KViews0likes1CommentJob aborted due to stage failure. Executor heartbeat timedout error after 203646 ms
Hi, We are getting below error sometimes randomly during the execution of different mapping data flow in Azure data factory. Job aborted due to stage failure. Task in stage failed 1 times , most recent failure. Lost task in stage. Executor heartbeat timedout error after 203646 ms. Is anyone aware of the reason of this ? Any idea to resolve this would be appreciated.2KViews0likes0CommentsADF Dataflow error
Hi , While executing different mapping dataflow in azure data factory we received below error few times. Can anyone give any idea about the root cause of the error and how to resolve it ? org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage failed 1 times, most recent failure: Lost task 0.0 in stage executor 0: ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 136606 ms1.9KViews0likes0CommentsHow to load data from On-prem to Snowflake using ADF in better way
Hi, My use case is as follows: Our data source is an On-prem SQL Server, and it serves as our production database. Currently, we are building reports in Power BI and utilizing Snowflake as our data warehouse. I aim to extract 10 to 15 tables for my Power BI reporting into Snowflake, specifically wanting to construct a SCD Type 1 Pipeline, without the need to retain historical data. To facilitate the data transfer from On-Prem to Snowflake, we are leveraging Azure Data Factory, with blob storage set up in Azure. We already have a Self-hosted runtime in place that connects to Data Factory. Currently, I've employed the For Each loop activity to copy 15 tables from On-prem to Snowflake. The pipeline is scheduled to run daily, and each time it executes, it truncates all 15 tables before loading the data. However, this process is time-consuming due to the volume of data, especially since our On-prem SQL server is a legacy database. My questions are as follows: Is there a more efficient approach within Data Factory for this task? What are the best practices recommended for this type of case study? Dataflow is not functioning with the Self-hosted runtime; how can I activate Dataflow for an On-prem database? I would greatly appreciate any advice on the correct solution for this or pointers to relevant documentation or blogs.1.7KViews0likes0CommentsIncremental updates with overlap (delete last N days of records and load "NOW- M days" from source)
Hi, I am looking into the possibilities of replacing our current legacy ETL setup with Azure dedicated pool and Data Factory. The source will be PostgreSQL, the destination should be Azure dedicated pool. One thing I need help with is whether it is possible and how to perform the following: an "upsert" operation, or in more simple terms, identify the last day of data in the destination (denote here as LastDay) remove last N days of records (the last few days could be stale) from the ETL destination (Azure pool), then query the source for the new data (i.e. newer than "LastDay - N days") and update the records in the destination (I have found this scenario: https://docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-multiple-tables-powershell regarding incremental update - however, I need that overlap (removal of last N days of possibly stale records) Thank you1.6KViews0likes1CommentIncrementally Load data from Azure SQL DB to ADLS Gen2 without WaterMarking with Partitioning
I want to incrementally load data from Azure SQL Database to Azure ADLS Gen2 without watermarking i.e., I do not have any control on Azure SQL Database apart from using the linked service and table Also the folder structure in ADLS is YYYY/MM which comes from one of the column of Azure SQL Database. Added two sources(SQL, ADLS) to DataFlows, created a derived Column Transformation to split YYYYMM column in SQL to YYYY and MM. Now I want to take this YYYY and MM columns into ADLS source and search in respective YYYY, MM folders. How do I achieve it . Below is the screenshot Tzvia or anyone else1.3KViews0likes1CommentGetting a 403 Error in Copy Activity
Hello everyone, I am doing a copy activity from sharepoint to blob using this doc: https://docs.microsoft.com/en-us/azure/data-factory/connector-sharepoint-online-list?tabs=data-factory#copy-file-from-sharepoint-online Since the data in sharepoint is not a list, I am using the second flow using web activity and copy activity. However I am getting an error in the copy activity. ErrorCode=HttpFileFailedToRead,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to read data from http server. Check the error from http server:The remote server returned an error: (403) Can anyone guide me why this is happening ? [I followed all the steps in the doc] Thank you1.3KViews0likes0CommentsMoving data from SAP API to Rest API using ADF
Hi Team, i am asking few questions about ADF which i have created but need suggestion for improvement Requirement: Using ADF for transferring records from Source (SAP ECC) to Destination (Rest API) Approach used : 1. Create Copy Activity with appropriate connector and applied proper mapping of fields , but surprised even output of pipeline showing records are written on destination , but there were no records refer below SS. [ Any comment or suggestion for this issue ] 2. In next approach i created a new pipeline with below chaining components : A. Web Activity [for fetching records from SAP ECC connector ] which returns many fields but few of needed for next processing B. Store output of A into Variable (Array) using set variable C. Then add Foreach activity to process each record of B one by one through another Web Activity (inside for each loop) although destination api have option to put many records in one request. Now problem area is that second approach take so long time as it iterates for each element of array, but in my case , it would need to be speedup (in case of need to process 20000 records Ex.) Constraints need to follow : Without saving data records anywhere like BLOB storage . Any suggestion for overcoming this problem.1.2KViews0likes1CommentDataFactory Setup To Use a Private Network
Hi. I have a Pipeline that executes an SSIS package (through the SSIS-IR). This package uses a C# Script to call a web service to get data and transfer it into an Azure SQL Database. I have a requirement to execute this on a private secure network. I have researched solutions and found information about a private link IR (sorry, that may not be the exact terminology). And have my SSIS-IR reference that private link IR. Is this the proper solution? Will it give me a secure/private "tunnel" for the data that is received from the API back to the Azure SQL Database? Is there a way to setup the subscription to use only a private network for all incoming and outgoing traffic? That way if data is transferred without using the SSIS-IR it will not be exposed to public addresses? I appreciate suggestions and recommendations, or links to KB articles about using private networks from Azure Cloud Resources. Thank you.1.2KViews0likes1Comment