Copy Activity
53 TopicsGeneral availability of SAP CDC capabilities for Azure Data Factory and Azure Synapse Analytics
Customers use SAP systems for their business-critical operations.Today, customers want to be able to combine their SAP data with non-SAP data for their analytics needs. Azure Data Factory (ADF) is an industry-leading data integration service which enables customers to ingest data from diverse data sources (e.g., multi-cloud, SaaS, on-premises), transform data at scale, and more. Azure Data Factory (ADF)works seamlessly to combine data and prepare it at cloud-scale. Customers are using ADF to ingest data from different SAP data sources (e.g., SAP ECC, SAP Hana, SAP Table, SAP BW Open Hub, SAP BW via MDX, SAP Cloud for Customers), and combining them with data from other operational stores (e.g., Cosmos DB, Azure SQL family, and more). This enables customers to gain deep insights from both SAP and non-SAP data. Today, we are excited to announce the General Availability of SAP CDC support in Azure Data Factory and Azure Synapse Analytics.20KViews7likes13CommentsFailure of azure data factory integration runtime with Vnet enabled
I had been using Data Factory's integration runtime with VNet successfully, but it recently stopped connecting to Cosmos DB with the MongoDB API (which is also within a VNet). After setting up a new integration runtime with VNet enabled and selecting the region as 'Auto Resolve,' the pipeline ran successfully with this new runtime. Could you help me understand why the previous integration runtime—configured with VNet enabled and the region set to match that of Azure Data Factory—worked for over a month but then suddenly failed? The new integration runtime with VNet and 'Auto Resolve' region worked, but I'm uncertain if the 'Auto Resolve' region contributed to the success or if something else allowed it to connect. Error:Failure happened on 'Source' side. ErrorCode=MongoDbConnectionTimeout,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=>Connection to MongoDB server is timeout.,Source=Microsoft.DataTransfer.Runtime.MongoDbAtlasConnector,''Type=System.TimeoutException,Message=A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [{ ServerId: "{ ClusterId : 1, EndPoint : "Unspecified/cosmontiv01u.mongo.cosmos.azure.com:10255" }", EndPoint:19Views0likes0CommentsIncremental Load from Servicenow kb_knowledge table
Hi, I have been trying to copy only new kb data from the kb_knowledge table in servicenow to a blob storage. I tried to use the query builder but it copies all of the kb data. Is there another way to do this?? Thanks in advance!136Views0likes0Comments'Cannot connect to SQL Database' error - please help
Hi, Our organisation is new to Azure Data Factory (ADF) and we're facing an intermittent error with our first Pipeline. Being intermittent adds that little bit more complexity to resolving the error. The Pipeline has two activities: 1) Script activity which deletes the contents of the target Azure SQL Server database table that is located within our Azure cloud instance. 2) Copy data activity which simply copies the entire contents from the external (outside of our domain) third-party source SQL View and loads it to our target Azure SQL Server database table. With the source being external to our domain, we have used a Self-Hosted Integration Runtime. The Pipeline executes once per 24 hours at 3am each morning. I have been informed that this timing shouldn't affect/or by affected by any other Azure processes we have. For the first nine days of Pipeline executions, the Pipeline successfully completed its executions. Then for the next nine days it only completed successfully four times. Now it seems to fail every other time. It's the same error message that is received on each failure - the received error message is below (I've replaced our sensitive internal names with Xs). Operation on target scr__Delete stg__XXXXXXXXXX contents failed: Failed to execute script. Exception: ''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot connect to SQL Database. Please contact SQL server team for further support. Server: 'XX-azure-sql-server.database.windows.net', Database: 'XX_XXXXXXXXXX_XXXXXXXXXX', User: ''. Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access.,Source=Microsoft.DataTransfer.Connectors.MSSQL,''Type=Microsoft.Data.SqlClient.SqlException,Message=Server provided routing information, but timeout already expired.,Source=Framework Microsoft SqlClient Data Provider,'' To me, if this Pipeline was incorrectly configured then the Pipeline would never have successfully completed, not once. With it being intermittent, but becoming more frequent, suggests it's being caused by something other than its configuration, but I could be wrong - hence requesting help from you. Please can someone advise on what is causing the error and what I can do to verify/resolve the error? Thanks.822Views0likes2CommentsFlattening nested JSON values in a dataflow with varying keys.
We are using Azrue DevOps REST API calls to return JSON files and storing them in blob. Then we perform a dataflow to transform the data. The issue is a portion of the JSON being stored in blob has varying keys. When we specify the columns to map in a Select action, we are selecting specifically one of the varying keys from a list of options. But need to map ALL of these – we cannot manually specify these because the data source is so large. We cannot implement a standard name for this section of the JSON. A wildcard for { } would work ideally but is not supported. We do not care what the keys are, just the contents (id, name). Select Action: Source Column: resources.pipelines.{src-release}.pipeline.id resources.pipelines.{src-release}.pipeline.name resources.pipelines.{build }.pipeline.id resources.pipelines.{build }.pipeline.name Mapping Name as: ‘pipelineID’ ‘pipelineName’ Below is a JSON snippet which highlights the key from the source JSON Example of Select action mapping – each key shows as its own dropdown:258Views0likes0CommentsCopy Activity from BLOB CSV to C4C OData Services failes on csrf token
HI There , 1)when trying to get data from C4C to blob using adf we were able to extract data with out any issues . 2)when trying insert the downloaded file back to C4C connection ( sap/c4c/odata/v1/c4codataapi/) using copy Activity in ADF , confronting an issues with Csrf token not supported for the odata endpoint. canyou please provide me how to resolve this conflict. NOTE: the user has sufficient permissions to insert data error LOG: "errors": [ { "Code": 23208, "Message": "ErrorCode=ODataCsrfTokenNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Csrf token not supported for the odata endpoint.,Source=Microsoft.DataTransfer.Runtime.ODataConnector,'", "EventType": 0, "Category": 5, "Data": {}, "MsgId": null, "ExceptionType": null, "Source": null, "StackTrace": null, "InnerEventInfos": [] }298Views0likes0CommentsAzure Data Factory Copy Data Activity changes data while copying Parquet data to Dedicated SQL Pool
Hi All, We have a parquet file on a ADLS2 Storage container, that has over 7 million rows of data. We created a Copy Data Activity on Azure Data Factory, to move this data to a table in Dedicated SQL Pool. All the data from the Parquet file goes into the database table accurately, except for this one row, where there is a Decimal value of 78.6 in the parquet file, that goes into the SQL table as 78.5. Here's more context on the steps we took so far to trace the root cause for this issue: We have tried to change the parquet file name and push it to this table -- the data still goes in to the SQL table as 78.5 (when the parquet file has 78.6) We have tried to create a version 2 table in the SQL DB and pushed the data into this V2 table using the Copy Data Activity, still, the data goes in as 78.5 We have checked the compression type used to create the parquet file on our python code (it is GZIP), the compression type used to unzip this parquet file data on Data Factory's Dataset connection -- earlier it was snappy, we changed it to GZIP and re-ran the Copy Data Activity -- and still, the data goes in as 78.5 We have checked the Decimal data type precision and scale, as well as the datatype Mapping from source to sink -- if this was off, the whole dataset should have issues for this column, but it is this one row, that goes in to the SQL table incorrectly. Ask: Has any of you ever encountered this issue before? If so, how did you solve it? Any suggestions are welcome. Thank you!!380Views0likes0CommentsHow to handle azure data factory lookup activity with more than 5000 records
Hello Experts, The DataFlow Activity successfully copies data from an Azure Blob Storage .csv file to Dataverse Table Storage. However, an error occurs when performing a Lookup on the Dataverse due to excessive data. This issue is in line with the documentation, which states that the Lookup activity has a limit of 5,000 rows and a maximum size of 4 MB. Also, there is a Workaround mentioned (Micrsofot Documentation): Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size. How can I do this? Is there a way to define an offset (e.g. only read 1000 rows) Thanks, -Sri2.7KViews0likes1Comment