Azure Data Integration
39 Topics- How to handle azure data factory lookup activity with more than 5000 recordsHello Experts, The DataFlow Activity successfully copies data from an Azure Blob Storage .csv file to Dataverse Table Storage. However, an error occurs when performing a Lookup on the Dataverse due to excessive data. This issue is in line with the documentation, which states that the Lookup activity has a limit of 5,000 rows and a maximum size of 4 MB. Also, there is a Workaround mentioned (Micrsofot Documentation): Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size. How can I do this? Is there a way to define an offset (e.g. only read 1000 rows) Thanks, -Sri3.5KViews0likes1Comment
- Connection Failed - new linked services azure data factoryHi, I am having trouble creating a new linked service on Azure for connecting it with MySQL. I am getting a weird error saying that " Connection failed The value of the property '' is invalid: 'Format of the initialization string does not conform to specification starting at index 115.'. Format of the initialization string does not conform to specification starting at index 115. anyone aware of this error.2.2KViews0likes0Comments
- ADF - data connect from blob to Azure SQLHi All, I have a scenario, I have a multiple excel files (4 files) in storage blob and need to upload in SQL in 4 different table (I have 4 tables for staging and 4 tables for master table) . I have created the stored procedure in SQL for those 4 files. Can anyone help me with the ADF process to upload automatically on regular basis. Thanks2.1KViews1like5Comments
- Job aborted due to stage failure. Executor heartbeat timedout error after 203646 msHi, We are getting below error sometimes randomly during the execution of different mapping data flow in Azure data factory. Job aborted due to stage failure. Task in stage failed 1 times , most recent failure. Lost task in stage. Executor heartbeat timedout error after 203646 ms. Is anyone aware of the reason of this ? Any idea to resolve this would be appreciated.2KViews0likes0Comments
- Connect Azure Data Factory to Google Ads ProblemHi all, I am conecting Azure Data Factory to our Google Ads account in order to export some data. We want to use Service account but when I setup the ADF LinkedService (Google Ads type) and enter the Developer token, Auth Type: Service Authetication, Email, Key File path and trusted cert path in give me an error: Connection Failed: ERROR [28000] [Microsoft][DriverOAuthSupport] (8706) Unable to find or open file from: url.json Could someone help us? Regards!Solved1.8KViews0likes1Comment
- How to load data from On-prem to Snowflake using ADF in better wayHi, My use case is as follows: Our data source is an On-prem SQL Server, and it serves as our production database. Currently, we are building reports in Power BI and utilizing Snowflake as our data warehouse. I aim to extract 10 to 15 tables for my Power BI reporting into Snowflake, specifically wanting to construct a SCD Type 1 Pipeline, without the need to retain historical data. To facilitate the data transfer from On-Prem to Snowflake, we are leveraging Azure Data Factory, with blob storage set up in Azure. We already have a Self-hosted runtime in place that connects to Data Factory. Currently, I've employed the For Each loop activity to copy 15 tables from On-prem to Snowflake. The pipeline is scheduled to run daily, and each time it executes, it truncates all 15 tables before loading the data. However, this process is time-consuming due to the volume of data, especially since our On-prem SQL server is a legacy database. My questions are as follows: Is there a more efficient approach within Data Factory for this task? What are the best practices recommended for this type of case study? Dataflow is not functioning with the Self-hosted runtime; how can I activate Dataflow for an On-prem database? I would greatly appreciate any advice on the correct solution for this or pointers to relevant documentation or blogs.1.7KViews0likes0Comments
- How to save Azure Data Factory work (objects)?Hi, I'm new to Azure Data Factory (ADF). I need to learn it in order to ingest external third-party data into our domain. I shall be using ADF Pipelines to retrieve the data and then load it into an Azure SQL Server database. I currently develop Power BI reports and write SQL scripts to feed the Power BI reporting. These reports and scripts are saved in a backed-up drive - so if anything disappears, I can always use the back-ups to install the work. The target SQL database scripts, the tables the ADF Pipelines will load to, will be backed-up following the same method. How do I save the ADF Pipelines work and any other ADF objects that I may create (I don't know what exactly will be created as I'm yet to develop anything in ADF)? I've read about this CI/CD process but I don't think it's applicable to me. We are not using multiple environments (i.e. Dev, Test, UAT, Prod). I am using a Production environment only. Each data source that needs to be imported will have it's own Pipeline, so breaking a Pipeline should not affect other Pipelines and that's why I feel a single environment is suffice. I am the only Developer working within the ADF and so I have no need to be able to collaborate with peers and promote joint coding ventures. Does the ADF back-up it's Data Factories by default? If they do, can I trust that should our instance of ADF be deleted then I can retrieve the backed-up version to reinstall or roll-back? Is the a process/software which saves the ADF objects so I can reinstall them if I need to (by the way, I'm not sure how to reinstall them so I'll have to learn that)? Thanks.Solved1.6KViews0likes2Comments
- Truncate/Delete Registros do Contêiner Cosmos DBO Objetivo da discussão é apenas passar o conhecimento de como deletar os registros no Cosmo DB, se alguém tiver outra ideia, pode compartilhar a informação. Informação para quem já faz a criação de solução de engenharia de dados do Azure usando o Azure Data Factory como uma ferramenta de orquestração e o Azure Cosmos DB em um cenário em que talvez seja necessário excluir documentos de um contêiner SQL, você já deve ter percebido que há não é uma maneira fácil de fazê-lo. O Azure Data Factory, pronto para uso, fornece apenas atividades de movimentação de dados de/para o Cosmos DB O Data Factory ( até este momento 02/04/2022 ) não tem as atividades para executar procedimentos armazenados do Cosmos DB ou excluir documentos em um contêiner SQL. Essa postagem é para você que já usa o Azure Data Factory, Azure Cosmos DB (SQL API) e Azure Logic Apps. C O S M O S D B Fluxo de trabalho de alto nível para fazer isso Crie uma "Stored Procedure" no contêiner do Cosmos DB que execute uma consulta de seleção e exclua os dados necessários. Segue abaixo o código simples e fácil de usar. function bulkDeleteSproc(query) { var collection = getContext().getCollection(); var collectionLink = collection.getSelfLink(); var response = getContext().getResponse(); var responseBody = { deleted: 0, continuation: true }; if (!query) throw new Error("Consulta inválida!!!"); tryQueryAndDelete(); function tryQueryAndDelete(continuation) { var requestOptions = {continuation: continuation}; var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) { if (err) throw err; if (retrievedDocs.length > 0) { tryDelete(retrievedDocs); } else if (responseOptions.continuation) { tryQueryAndDelete(responseOptions.continuation); } else { responseBody.continuation = false; response.setBody(responseBody); } }); if (!isAccepted) { response.setBody(responseBody); } } function tryDelete(documents) { if (documents.length > 0) { var isAccepted = collection.deleteDocument(documents[0]._self, {}, function (err, responseOptions) { if (err) throw err; responseBody.deleted++; documents.shift(); tryDelete(documents); }); if (!isAccepted) { response.setBody(responseBody); } } else { tryQueryAndDelete(); } } } Configure um Aplicativo Lógico do Azure para executar o procedimento armazenado quando uma solicitação HTTP for recebida. Configure um pipeline de Data Factory para usar a atividade da Web para invocar o aplicativo lógico. L O G I C A P P S Abra o Logic Apps e selecione um template em branco. Adicione a trigger "When a HTTP request is received" Adicione uma nova etapa e escolha a ação "Execute stored procedure" Navegue pela assinatura atual e conecte-se ao cosmos db. Insira 'ID do banco de dados', 'ID da coleção', 'ID Sproc', 'Valor da chave de partição', 'parâmetros' Adicione uma nova ação de solicitação “Response” da etapa e adicione conteúdo dinâmico no Body. Este Body retornará a resposta da procedure quando executado (consulte o exemplo de resposta acima) Agora salve o aplicativo. Após salvar, uma URL HTTP POST será gerada no gatilho, copie essa URL para uso no ADF. D A T A F A C T O R Y Para executar o Logic App do Azure Data Factory, precisamos enviar uma solicitação HTTP para o aplicativo e isso é possível usando a atividade da web. Pode ser necessário executar o sproc BulkDelete várias vezes (dependendo se o sproc é capaz de excluir todos os documentos dentro do limite de tempo limite de execução) Para garantir a exclusão de todos os documentos, executaremos essa atividade da Web dentro de uma atividade Até que a condição (a resposta do sproc deve ser "excluída: 0") seja atendida Adicione uma atividade Until ao pipeline e nomeie-a como “UntilAllDocsDeleted” Adicione uma atividade da Web dentro da atividade Until e nomeie-a como “Delete” Cole o URL copiado do aplicativo lógico nas configurações da atividade da Web, use o método POST e o corpo contém {} Agora vamos adicionar a condição na atividade Até para que a atividade da Web seja executada várias vezes até que a condição seja atendida. @equals(0,activity('Delete').output.deleted) F I M1.5KViews0likes0Comments
- Incrementally Load data from Azure SQL DB to ADLS Gen2 without WaterMarking with PartitioningI want to incrementally load data from Azure SQL Database to Azure ADLS Gen2 without watermarking i.e., I do not have any control on Azure SQL Database apart from using the linked service and table Also the folder structure in ADLS is YYYY/MM which comes from one of the column of Azure SQL Database. Added two sources(SQL, ADLS) to DataFlows, created a derived Column Transformation to split YYYYMM column in SQL to YYYY and MM. Now I want to take this YYYY and MM columns into ADLS source and search in respective YYYY, MM folders. How do I achieve it . Below is the screenshot Tzvia or anyone else1.3KViews0likes1Comment
- 'Cannot connect to SQL Database' error - please helpHi, Our organisation is new to Azure Data Factory (ADF) and we're facing an intermittent error with our first Pipeline. Being intermittent adds that little bit more complexity to resolving the error. The Pipeline has two activities: 1) Script activity which deletes the contents of the target Azure SQL Server database table that is located within our Azure cloud instance. 2) Copy data activity which simply copies the entire contents from the external (outside of our domain) third-party source SQL View and loads it to our target Azure SQL Server database table. With the source being external to our domain, we have used a Self-Hosted Integration Runtime. The Pipeline executes once per 24 hours at 3am each morning. I have been informed that this timing shouldn't affect/or by affected by any other Azure processes we have. For the first nine days of Pipeline executions, the Pipeline successfully completed its executions. Then for the next nine days it only completed successfully four times. Now it seems to fail every other time. It's the same error message that is received on each failure - the received error message is below (I've replaced our sensitive internal names with Xs). Operation on target scr__Delete stg__XXXXXXXXXX contents failed: Failed to execute script. Exception: ''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot connect to SQL Database. Please contact SQL server team for further support. Server: 'XX-azure-sql-server.database.windows.net', Database: 'XX_XXXXXXXXXX_XXXXXXXXXX', User: ''. Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access.,Source=Microsoft.DataTransfer.Connectors.MSSQL,''Type=Microsoft.Data.SqlClient.SqlException,Message=Server provided routing information, but timeout already expired.,Source=Framework Microsoft SqlClient Data Provider,'' To me, if this Pipeline was incorrectly configured then the Pipeline would never have successfully completed, not once. With it being intermittent, but becoming more frequent, suggests it's being caused by something other than its configuration, but I could be wrong - hence requesting help from you. Please can someone advise on what is causing the error and what I can do to verify/resolve the error? Thanks.1.3KViews0likes2Comments