data flows
9 TopicsClarification on Staging Directory Usage for SAP CDC Connector in Azure Data Factory
Hi! I'm currently working on a project where we are ingesting data from SAP using the SAP CDC connector in Azure Data Factory(Data flow). The source is S4HAHA CDS views. We are using a staging directory for the data flow with a checkpoint mechanism, similar to described here: https://learn.microsoft.com/en-us/azure/data-factory/connector-sap-change-data-capture My question is: Does the staging directory only act as a temporary storage location during ingestion from sap? If i understand correctly its used for retries, but no real usage once the deltas have been ingested. After the data has been loaded to the destination(in our case container inside of ADLS), is the data needed for maintaining delta states? Can the data be safely deleted(from the staging container) without impacting the subsequent load runs? We were thinking of implementing a 7 day retention policy on the staging container so we can manage storage efficiently. Thank you in advance for any information regarding this.61Views1like0CommentsData flow sink supports user db schema for staging in Azure Synapse and PostgreSQL connectors
To achieve the fastest loading speed for moving data into a data warehouse table, load data into a staging table. Consider that loading is usually a two-step process in which you first load to a staging table and then insert the data into a production data warehouse table. Loading to the staging table takes longer, but the second step of inserting the rows to the production table does not incur data movement across the distributions. Data flow sink transformation supports staging. By default, a temporary table will be created under the sink schema as staging. For Azure Synapse Analytics and Azure PostgreSQL, you can alternatively uncheck the Use sink schema option and instead, specify a schema name under which Data Factory will create a staging table to load upstream data and automatically clean them up upon completion. Make sure you have create table permission in the database and alter table permissions on the schema. Please follow links below for more details. User db schema for staging in Azure Synapse Analytics User db schema for staging in Azure PostgreSQL4.6KViews1like0CommentsImplement Fill Down in ADF and Synapse Data Flows
"Fill Down" is an operation common in data prep and data cleansing meant to solve the problem with data sets when you want to replace NULL values with the value from the previous non-NULL value in the sequence. Here is how to implement this in ADF and Synapse data flows.6.7KViews1like2Comments