Azure Data Integration
33 TopicsWhat Synapse Serverless SQL pool authentication type for ADF Linked Service?
Hi, I'm relatively new to Azure Data Factory and require your guidance on how to successfully create/test a Linked Service to the Azure Synapse Analytics Serverless SQL pool. In the past, I've successfully created a Linked Service to a third-party (outside our domain) on-premises SQL Server through creating a self-hosted integration runtime on their box and then creating a Linked Service to use that. The Server Name, Database Name, Windows authentication, my username and password all configured by the third-party is what I entered into the Linked Service configuration boxes. All successfully tested. This third-party data was extracted and imported, via ADF Pipelines, into an Azure SQL Server database within our domain. Now I need to extract data from our own (hosted in our domain) Azure Synapse Analytics Serverless SQL pool database. My attempt is this, and it fails: 1) I create a 'Azure Synapse Analytics' Data Store Linked Service. 2) I select the 'AutoResolveIntegrationRuntime' as the runtime to use - I'm thinking this is correct as the Synapse source is within our domain (we're fully MS cloud based). 3) I select 'Enter manually' under the 'Account selection method'. 4) I've got the Azure Synapse Analytics Serverless SQL endpoint - which I place into the 'Fully qualified domain name' field. 5) I entered the data SQL Database name found under the 'SQL database' node/section present on the Data >> Workspace screen in Synapse. 6) I choose 'System-assigned managed identity' as the Authentication type - this is a guess and I was hoping it would recognised my username/account that I am building the Linked Service with, as that account also can query Synapse too and so has Synapse access. 7) I check the 'Trust server certification' box. All else is default. When I click test connection, it fails with the following message: "Cannot connect to SQL Database. Please contact SQL server team for further support. Server: 'xxxxxxxxxxxx-ondemand.sql.azuresynapse.net', Database: 'Synapse_Dynamics_data', User: ''. Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access. Login failed for user '<token-identified principal>'." I've reached out to our I.T. (who are novices with Synapse, ADF, etc.. even though they did install them in our domain) and they don't know how to help me. I'm hoping you can help. 1) Is choosing the 'Azure Synapse Analytics' the correct Data Store to chose when looking extract data from an Azure Synapse Serverless SQL pool SQL database? 2) Is using the AutoResolveIntegrationRuntime correct if Synapse is held within our domain? I've previously confirmed this runtime works (and still does) as when importing the third-party data I had to use that runtime to load the data to our Azure SQL Server database. 3) Have I populated the correct values for the 'Fully qualified domain name' and 'Database name' fields by entering the Azure Synapse Analytics Serverless SQL endpoint and subsequent SQL Database name, respectively? 4) Is choosing 'System-assigned managed identity' as the Authentication type correct? I'm guessing this could be the issue. I selected this as when loading the mentioned third-party data into the Azure SQL Server database, within our domain, this was the authentication type that was used (and works) and so I'm assuming it somehow recognises the user logged in and, through the magic of cloud authentication, says this user has the correct privileges (as I should have the correct privileges so say I.T.) so allow the Linked Service to work. Any guidance you can provide me will be much appreciated. Thanks.28Views0likes0Comments'Cannot connect to SQL Database' error - please help
Hi, Our organisation is new to Azure Data Factory (ADF) and we're facing an intermittent error with our first Pipeline. Being intermittent adds that little bit more complexity to resolving the error. The Pipeline has two activities: 1) Script activity which deletes the contents of the target Azure SQL Server database table that is located within our Azure cloud instance. 2) Copy data activity which simply copies the entire contents from the external (outside of our domain) third-party source SQL View and loads it to our target Azure SQL Server database table. With the source being external to our domain, we have used a Self-Hosted Integration Runtime. The Pipeline executes once per 24 hours at 3am each morning. I have been informed that this timing shouldn't affect/or by affected by any other Azure processes we have. For the first nine days of Pipeline executions, the Pipeline successfully completed its executions. Then for the next nine days it only completed successfully four times. Now it seems to fail every other time. It's the same error message that is received on each failure - the received error message is below (I've replaced our sensitive internal names with Xs). Operation on target scr__Delete stg__XXXXXXXXXX contents failed: Failed to execute script. Exception: ''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot connect to SQL Database. Please contact SQL server team for further support. Server: 'XX-azure-sql-server.database.windows.net', Database: 'XX_XXXXXXXXXX_XXXXXXXXXX', User: ''. Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access.,Source=Microsoft.DataTransfer.Connectors.MSSQL,''Type=Microsoft.Data.SqlClient.SqlException,Message=Server provided routing information, but timeout already expired.,Source=Framework Microsoft SqlClient Data Provider,'' To me, if this Pipeline was incorrectly configured then the Pipeline would never have successfully completed, not once. With it being intermittent, but becoming more frequent, suggests it's being caused by something other than its configuration, but I could be wrong - hence requesting help from you. Please can someone advise on what is causing the error and what I can do to verify/resolve the error? Thanks.773Views0likes2CommentsCan an ADF Pipeline trigger upon source table update?
Hi, Is it possible for an Azure Data Factory Pipeline to be triggered each time the source table changes? Let's say I have a 'copy data' activity in a pipeline. The activity copies data from TableA to TableB. Can the pipeline be configured to execute whenever source TableA is updated (a record deleted, changed, a new record inserted, etc..)? Thanks.228Views0likes0CommentsHow to save Azure Data Factory work (objects)?
Hi, I'm new to Azure Data Factory (ADF). I need to learn it in order to ingest external third-party data into our domain. I shall be using ADF Pipelines to retrieve the data and then load it into an Azure SQL Server database. I currently develop Power BI reports and write SQL scripts to feed the Power BI reporting. These reports and scripts are saved in a backed-up drive - so if anything disappears, I can always use the back-ups to install the work. The target SQL database scripts, the tables the ADF Pipelines will load to, will be backed-up following the same method. How do I save the ADF Pipelines work and any other ADF objects that I may create (I don't know what exactly will be created as I'm yet to develop anything in ADF)? I've read about this CI/CD process but I don't think it's applicable to me. We are not using multiple environments (i.e. Dev, Test, UAT, Prod). I am using a Production environment only. Each data source that needs to be imported will have it's own Pipeline, so breaking a Pipeline should not affect other Pipelines and that's why I feel a single environment is suffice. I am the only Developer working within the ADF and so I have no need to be able to collaborate with peers and promote joint coding ventures. Does the ADF back-up it's Data Factories by default? If they do, can I trust that should our instance of ADF be deleted then I can retrieve the backed-up version to reinstall or roll-back? Is the a process/software which saves the ADF objects so I can reinstall them if I need to (by the way, I'm not sure how to reinstall them so I'll have to learn that)? Thanks.Solved1.1KViews0likes2CommentsWorkflow orchestration manager dependencies between multiple Data factory accounts
Hello All, I am looking into a use case where 1. I have a workflow orchestration manager in Data factory at subscription A 2. Another workflow orchestration manager in data factory in subscription B Both data factories are available in the same region east US i created a pipeline(airflow) in Subscription A but should also make pipeline created in B as a dependency. In other words pipeline A is dependant on B. Is this possible with workflow orchestration manager in ADF or is this use case supported. Also if i have to accomplish the above task with data factories in different regions(one in US and other in East Asia) is this also possible? Thank you in advance318Views0likes0CommentsSome questions on ADF and Azure SQL Server
Hi, My company is looking to implement a data integration method. The project has been assigned to me but I'm not a data engineer and so I would like your guidance on the recommendation. I have the need to ingest several (only twelve at present) 3rd-party data sources into our domain so the data can be reported on. These external data sources are simple RDMS (most likely all to be MS SQL Server) and the volume of data, due to the 3rd-party creating a View for me, is only going to be around 20 columns and 20,000 rows, per data source. It's all structured data. My intention is to use Azure Data Factory (ADF) as the integration tool. The reason for this is we are entirely MS cloud-based and I see the ADF as the most suitable (simple, robust, cheap) MS cloud-based integration tool available - although you may inform me otherwise. I need to decide on the storage to hold the external data. I've had very brief experience with Synapse Serverless Pool, as it was the recommended substitute for Data Export Services (DES) (we use Dynamics 365 as our transactional system), which I found limiting in the SQL commands compatibility. Many of the SQL Views I had wrote upon DES weren't compatible in Synapse - I guess due to Synapse being written in Spark. For this reason, I am reluctant to use Synapse as the data storage. It is for this same reason I am reluctant to use the ADF Storage Account as I believe it is too written in Spark. Please can you advise on the below questions: 1) Is the ADF Storage Account written in Spark and thus prone to the same incompatibility as Synapse Serverless Pool is? 2) What are the benefits to using the ADF Storage Account over Azure SQL Server, and visa versa? 3) I know this question configuration specific but I'll ask anyway. Which is cheaper based on our basic use-case - ADF Storage Account or Azure SQL Server? I have trouble understanding the online pricing calculators. 4) I understand to execute activities/pipelines between Azure storage sources (ADF Storage Account, Azure SQL Server, etc.. Azure products) a 'Azure integration runtime' is needed. I also understand to extract data from an On-Premise SQL Server database a 'Self-Hosted integration runtime' is required - is this correct, and where will this 'Self-Hosted integration runtime' need to be installed (on the box that is running On-Premise SQL Server?)? I think that's all my questions for now. Thanks for your help.318Views0likes0CommentsHow to handle azure data factory lookup activity with more than 5000 records
Hello Experts, The DataFlow Activity successfully copies data from an Azure Blob Storage .csv file to Dataverse Table Storage. However, an error occurs when performing a Lookup on the Dataverse due to excessive data. This issue is in line with the documentation, which states that the Lookup activity has a limit of 5,000 rows and a maximum size of 4 MB. Also, there is a Workaround mentioned (Micrsofot Documentation): Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size. How can I do this? Is there a way to define an offset (e.g. only read 1000 rows) Thanks, -Sri2.6KViews0likes1CommentCall unbound custom action (Dynamics) from the ADF
Hello Experts, I have a global action in Dynamics and need to initiate a call from Azure Data Factory to pass input parameters and retrieve output parameters. Are there any methods available to invoke the unbound custom action from ADF? Kindly provide recommendations. Thanks, -Sri269Views0likes0Comments