Forum Widgets
Latest Discussions
Different pools for workers and driver - in ADF triggered ADB jobs
Hello All, Azure Databricks allows usage of separate compute pools for drivers and workers when you create a job via the native Databricks workflows. For customers using ADF as an orchestrator for ADB jobs, is there a way to achieve the same when invoking notebooks/jobs via ADF? The linked service configuration in ADF seems to allow only one instance pool. Appreciate any pointers. Thanks !Solved19Views0likes1CommentHow do I create a flow that adapts to new columns dynamically?
Hello, I have files landing in a blob storage container that I'd like to copy to a SQL database table. The column headers of these files are date markers, so each time a new file is uploaded, a new date will appear as a new column. How can I handle this in a pipeline? I think I'll need to dynamically accept the schema and then use an unpivot transformation to normalize the data structure for SQL, but I am unsure how to execute this plan. Thanks!big_ozzieJan 09, 2025Copper Contributor19Views0likes0CommentsDocumentation Generator for Azure Data Factory
Hi I couldnot find a document generator in ADF ( unless I am missing something) , So I have written a utility script in Python for generating documentation ( sort of a quick reference ) from the ARM export. I am still in the process of refining it and adding more artifacts, but you can modify and extend it as required. I hope someone will find it useful, to get a quick overview of all the objects in the ADF. https://github.com/sanjayganvkar/AzureDataFactory---Documentation-Generator Regards SanjaySanjayGanvkarJan 06, 2025Copper Contributor44Views0likes0CommentsDatabase connection
Hello, I have created a Linked Service to connect to a DB but the connection is failing due to firewall issue. I've whitelisted all the IPs for the region in which my resource is present but it's still failing and the IP in error is of another region. Why is this happening?ShivaniPadishalwarJan 05, 2025Copper Contributor21Views0likes1CommentParsing error while calling service now table api
Hi, I am trying to copy data from service now to blob storage using service now table api . While calling from postman, I am getting json output with data, but when I configure the same in ADF using ServiceNow linked service as mentioned here, I getting this error "Failed to parse the API response. Failed at: TableAPIClient FetchTableDataAsyncError message: Unexpected character encountered while parsing value: }. Path 'result[3716]', line 1, position 17316023". how I can resolve this error?anshi_t_kJan 02, 2025Copper Contributor13Views0likes0CommentsNeed to remove duplicate row value of same column using Data Pipeline
Hi Guys, I am very new in the Azure Data Factory world here is my current dataset of Data Flow - S_No Country Status Line_No Score Comment ABC CA AOK 1 32.67 ReviewedABCD ABC CA AOK 2 97.12 ReviewedXYZ ABC CA AOK 4 23.67 ReviewedDCBA And I want to achive following - S_No Country Status Line_No Score Comment ABC CA AOK 1 32.67 ReviewedABCD Null Null Null 2 97.12 ReviewedXYZ Null Null Null 4 23.67 ReviewedDCBA Basically need to remove the duplicate value of first 3 column. Any suggession to do that with the Data Flow is mostly welcome.tanmodutDec 26, 2024Copper Contributor9Views0likes0CommentsScheduled trigger is running 1second early
A scheduled trigger configured to execute every 10 minutes exhibits an issue where one of the six runs in every hour executes 1 second earlier than the expected schedule. This inconsistency impacts the timing accuracy of the trigger. My query is why is it running 1 second early? Due to this we are having issues skipping of few Jobs scheduled during that time.dileepkarankiDec 19, 2024Copper Contributor34Views0likes0CommentsNeed ADF pipeline suggestion
I have an ADF pipeline that copies Files from source to destination. Both Source and destinations are different folders within adls only. My pipeline design is as follows 1.) Lookup activity- A sql server Stored procedure that returns sourcepath and the destination path. This is connected to a Foreachloop 2.) Foreachloop activity - Has 10 as the batchcount. Within this activity I have Copydata activity 3.) Copydata activity - I have the source and sink paths set from the storedprocedure output columns. Source and destination Location is ADLS gen2. It works fine but I have about 1 millions files that the stored procedure returns and it takes about 20 mins to complete 1000 rows/files to copy. What settings/config can I change to make this run faster?CzarRDec 18, 2024Copper Contributor11Views0likes0Commentsadf upsert
Hi all, I'm trying to create sample Upsert data pipeline in ADF (insert missing records and update changed ones). I created sample source and target Postgres table with all data types available, the number of source and target columns slightly differ. CREATE TABLE "schema1".source_for_upsert_0001 ( col_serial SERIAL PRIMARY KEY, col_smallint SMALLINT, col_integer INTEGER, col_bigint BIGINT, col_decimal DECIMAL(10, 2), col_numeric NUMERIC(10, 2), col_real REAL, col_double DOUBLE PRECISION, col_smallserial SMALLSERIAL, col_serial_alias SERIAL, col_bigserial BIGSERIAL, col_money MONEY, col_char CHAR(5), col_varchar VARCHAR(50), col_text TEXT, col_bytea BYTEA, col_timestamp TIMESTAMP, col_timestamptz TIMESTAMPTZ, col_date DATE, col_time TIME, col_timetz TIMETZ, col_boolean BOOLEAN, col_uuid UUID, col_json JSON, col_jsonb JSONB, col_xml XML, col_inet INET, col_cidr CIDR, col_macaddr MACADDR, col_bit BIT(8), col_varbit VARBIT(16), col_interval INTERVAL, col_point POINT, col_line LINE, col_lseg LSEG, col_box BOX, col_path PATH, col_polygon POLYGON, col_circle CIRCLE, col_tsquery TSQUERY, col_tsvector TSVECTOR ); INSERT INTO "schema1".source_for_upsert_0001 ( col_smallint, col_integer, col_bigint, col_decimal, col_numeric, col_real, col_double, col_smallserial, col_serial_alias, col_bigserial, col_money, col_char, col_varchar, col_text, col_bytea, col_timestamp, col_timestamptz, col_date, col_time, col_timetz, col_boolean, col_uuid, col_json, col_jsonb, col_xml, col_inet, col_cidr, col_macaddr, col_bit, col_varbit, col_interval, col_point, col_line, col_lseg, col_box, col_path, col_polygon, col_circle, col_tsquery, col_tsvector ) VALUES ( 1, 100, 1000, 1234.56, 1234.56, 12.34, 12345.6789, 1, DEFAULT, DEFAULT, '$1234.56', 'A', 'Sample Text', 'This is a text field.', E'\x48656c6c6f', '2024-12-13 12:00:00', '2024-12-13 12:00:00+00', '2024-12-13', '12:00:00', '12:00:00+00', TRUE, '550e8400-e29b-41d4-a716-446655440000', '{"key": "value"}', '{"key": "value"}', '<note><to>User</to><message>Hello!</message></note>', '192.168.1.1', '192.168.0.0/24', '08:00:2b:01:02:03', B'10101010', B'1010101010101010', '1 year 2 months', '(1,1)', '((0,0),(1,1))', '((0,0),(1,1))', '((0,0),(2,2))', '((0,0),(1,1),(2,2),(2,0),(0,0))', '((0,0),(1,1),(2,2),(2,0),(0,0))', '<(1,1),1>', 'cat & dog', 'cat:3A dog:2A' ); CREATE TABLE "schema1".target_for_upsert_0001 ( col_2_serial SERIAL PRIMARY KEY, col_2_smallint SMALLINT, col_2_integer INTEGER, col_2a_integer INTEGER, col_2_bigint BIGINT, col_2_decimal DECIMAL(10, 2), col_2_numeric NUMERIC(10, 2), col_2_real REAL, col_2_double DOUBLE PRECISION, col_2_smallserial SMALLSERIAL, col_2_serial_alias SERIAL, col_2_bigserial BIGSERIAL, col_2_money MONEY, col_2_char CHAR(5), col_2_varchar VARCHAR(50), col_2_text TEXT, col_2_bytea BYTEA, col_2_timestamp TIMESTAMP, col_2_timestamptz TIMESTAMPTZ, col_2_date DATE, col_2_time TIME, col_2_timetz TIMETZ, col_2_boolean BOOLEAN, col_2_uuid UUID, col_2_json JSON, col_2_jsonb JSONB, col_2_xml XML, col_2_inet INET, col_2_cidr CIDR, col_2_macaddr MACADDR, col_2_bit BIT(8), col_2_varbit VARBIT(16), col_2_interval INTERVAL, col_2_point POINT, col_2_line LINE, col_2_lseg LSEG, col_2_box BOX, col_2_path PATH, col_2_polygon POLYGON, col_2_circle CIRCLE, col_2_tsquery TSQUERY, col_2_tsvector TSVECTOR ); I used "data flow". Upsert source: Upsert derived column - as we don't have "updated at" timestamp column in source / target, I plan to use md5 of all present row values to compare changes Upsert Alter Row - Upsert if: isNull(md5_columns)==false() Upsert Sink: Debugging: Could someone kindly look into it to advise what could be wrong? The last screenshot seems intended to clarify, but it doesn't really help to see the root cause.ymarkivDec 17, 2024Copper Contributor19Views0likes0CommentsClarification on Staging Directory Usage for SAP CDC Connector in Azure Data Factory
Hi! I'm currently working on a project where we are ingesting data from SAP using the SAP CDC connector in Azure Data Factory(Data flow). The source is S4HAHA CDS views. We are using a staging directory for the data flow with a checkpoint mechanism, similar to described here: https://learn.microsoft.com/en-us/azure/data-factory/connector-sap-change-data-capture My question is: Does the staging directory only act as a temporary storage location during ingestion from sap? If i understand correctly its used for retries, but no real usage once the deltas have been ingested. After the data has been loaded to the destination(in our case container inside of ADLS), is the data needed for maintaining delta states? Can the data be safely deleted(from the staging container) without impacting the subsequent load runs? We were thinking of implementing a 7 day retention policy on the staging container so we can manage storage efficiently. Thank you in advance for any information regarding this.DiskoSuperStarDec 02, 2024Copper Contributor26Views0likes0Comments
Resources
Tags
- Azure Data Factory144 Topics
- Azure ETL36 Topics
- Copy Activity33 Topics
- Azure Data Integration33 Topics
- Mapping Data Flows23 Topics
- Azure Integration Runtime21 Topics
- Data Flows3 Topics
- azure data factory v23 Topics
- ADF3 Topics
- Azure Synapse Analtyics2 Topics