Data Flows
6 TopicsHow do I create a flow that adapts to new columns dynamically?
Hello, I have files landing in a blob storage container that I'd like to copy to a SQL database table. The column headers of these files are date markers, so each time a new file is uploaded, a new date will appear as a new column. How can I handle this in a pipeline? I think I'll need to dynamically accept the schema and then use an unpivot transformation to normalize the data structure for SQL, but I am unsure how to execute this plan. Thanks!26Views0likes0CommentsClarification on Staging Directory Usage for SAP CDC Connector in Azure Data Factory
Hi! I'm currently working on a project where we are ingesting data from SAP using the SAP CDC connector in Azure Data Factory(Data flow). The source is S4HAHA CDS views. We are using a staging directory for the data flow with a checkpoint mechanism, similar to described here: https://learn.microsoft.com/en-us/azure/data-factory/connector-sap-change-data-capture My question is: Does the staging directory only act as a temporary storage location during ingestion from sap? If i understand correctly its used for retries, but no real usage once the deltas have been ingested. After the data has been loaded to the destination(in our case container inside of ADLS), is the data needed for maintaining delta states? Can the data be safely deleted(from the staging container) without impacting the subsequent load runs? We were thinking of implementing a 7 day retention policy on the staging container so we can manage storage efficiently. Thank you in advance for any information regarding this.34Views1like0CommentsImplement Fill Down in ADF and Synapse Data Flows
"Fill Down" is an operation common in data prep and data cleansing meant to solve the problem with data sets when you want to replace NULL values with the value from the previous non-NULL value in the sequence. Here is how to implement this in ADF and Synapse data flows.6.3KViews1like2CommentsData flow sink supports user db schema for staging in Azure Synapse and PostgreSQL connectors
To achieve the fastest loading speed for moving data into a data warehouse table, load data into a staging table. Consider that loading is usually a two-step process in which you first load to a staging table and then insert the data into a production data warehouse table. Loading to the staging table takes longer, but the second step of inserting the rows to the production table does not incur data movement across the distributions. Data flow sink transformation supports staging. By default, a temporary table will be created under the sink schema as staging. For Azure Synapse Analytics and Azure PostgreSQL, you can alternatively uncheck the Use sink schema option and instead, specify a schema name under which Data Factory will create a staging table to load upstream data and automatically clean them up upon completion. Make sure you have create table permission in the database and alter table permissions on the schema. Please follow links below for more details. User db schema for staging in Azure Synapse Analytics User db schema for staging in Azure PostgreSQL4.6KViews1like0CommentsEmpty File is getting created in ADF
I have a ADF pipeline, which has data flows. The data flows reads excel file and puts the records to a SQL DB. The incorrect records are pushed to a Sink of Blob Storage as CSV File. When all the records are correct and empty. csv file is getting created and pushed to Blob. How can I avoid creation of this empty file.1.6KViews0likes0CommentsDataflow
Hi, Urgently need help - how to read 120gb (3.4billion rows from a table) at lightening data from azure SQL server database to azure data Lake. I tried to two options: Copy activity with parallelism and highest DIU - this gives time out error after long running hours Data flow - this takes 11 hours long time to read data Please suggest753Views0likes0Comments