Recent Discussions
Specific Use Case: REST API Pagination in Data Factory
Hello, I seem to have a specific use case in regards to ingesting data from an REST API endpoint and struggling on how to use pagination within the source instead of using the Until function. I got the Until function to work and it cycles through my pages, but the issue is that it creates a new document per page when I want all the information consolidated into one file/blob. For my REST API endpoint, I have a base url that doesn't change and a relative url that uses a start page and a count. The start page is the obvious page to start the call on and the count is the number of records it will return. I have set these up as parameters in the source with start page = 1 and count = 400. For this particular call, using the Until function results in 19 separate pages of 400 by adding '1' to the start page for each call until a field called hasMoreResults (bool) in the response equals false. Below is the JSON response from the API endpoint where you can see "hasMoreResults" = True and the "results" section of the JSON has all the returned records: { "totalResults": 7847, "hasMoreResults": true, "startIndex": 1, "itemsPerPage": 10, "results": [], "facets":[] } The startIndex equals the startPage. With this, I am looking for any advice on how to run this query using the pagination rules so that all 7847 results end up in one file. I have tried many different things and feel like I need two pagination rules: AbosulteURL needs to add '1' to every page so it cycles through and then an endCondition where it stops when hasMoreResults = false. Any help with this would be greatly appreciated! One thing I did as well, in the Until function to make this work is store the "hasMoreResults" bool value into a cached variable and this is my statement for the expression in the Until but can't seem to get this working as a pagination end condition: "value": "@not(activity('Org Data flow').output.runStatus.output.sinkHasMoreResults.value[0].hasMoreResults)" These are the current pagination rules that don't seem to work:555Views0likes1CommentDecrease used storage in Azure
Hello, I want to reduce the storage used by an azure SQL database. I have managed to reduce the "allocated space" to below 6 GB. Can I change the "Max storage" to 6 GB without impact on the database itself? I can not find a definite answer online. Kind Regards, Bas23Views0likes0CommentsIssue with Auto Setting for Copy Parallelism in ADF Copy Activity
Hello everyone, I've been utilizing Azure Data Factory (ADF) and noticed the option to set the degree of copy parallelism in a copy activity, which can significantly enhance performance when copying data, such as blob content to an SQL table. However, despite setting this option to "Auto," the degree of parallelism remains fixed at 1. This occurs even when copying hundreds of millions of rows, resulting in a process that takes over 2 hours. My Azure SQL database is scaled to 24 vCores, which should theoretically support higher parallelism. Am I missing something, or is the "Auto" setting for copy parallelism not functioning as expected? Any insights or suggestions would be greatly appreciated! Thank you.28Views0likes1CommentMigration Data Factory pipelines between tenants
Hi everybody. I need your help please. I'm trying to migrate several Data Factory pipelines between 2 diferent fabric tenants. I'm using Azure DevOps to move all the workspaces, I created the connections with the same name but when I try to restore the data factory pipelines it return an error than datafactory pipielines can't be created because doesn't find the connections. I was trying to update the connection ID but I don't find them into the json file. How can I migrate these data factories and reconnect to the new connections?30Views0likes0CommentsSynapse workspace cost reduction
I have a Cosmos DB where I have one container that contains different documents. One document is a main document that has another related document. Both documents are related to each other by using a partition key. There will be one main document and multiple event documents with the same partition key. The main document has fields like date, country, and categories, which the event document does not have, while the event document has fields like event type, event dateandtime etc. To filter how many events happened for a particular category on a particular day, we have to use the main document. The events can be repetitive on a single day. My requirement is to create a Power BI report to display how many events happened on a particular day and for which country in the last 2 months (each event should display only one time per category, per country in a day). I want to get this data from Synapse and load it into Power BI for the last 2 months. I used the Synapse view and implemented incremental dataset refresh in a power BI. In a Synapse view, I created a main view that loads data for a main document, and in another view, I get those partition keys from the main view and then load the data for an event document. There are 2 dates in a main document: created date and change date. I cannot use the change date in incremental dataset refresh as it creates duplicate records, so I used the created date and then used the data to detect changes for the last 30 days (this is the time period where the main document can change). It works well, but the problem here is that it takes a lot of time to execute the query, which is causing more cost for data processing in Synapse. Is there any suggestion to reduce the cost consumption of Synapse as well as query execution time/dataset refresh time in Power BI?10Views0likes0CommentsAzure Devops and Data Factory
I have started a new job and taken over ADF. I know how to use Devops to integrate and deploy when everything is up and running. The problem is, it's all out of sync. I need to learn ADO/ADF as they work together so I can fix this. Any recommendations on where to start? Everything on YouTube is starting with a fresh environment which I'd be fine with. I'm not new to ADO, but I've never been the setup guy before. And I'm strong on ADO management, just using it. Here are some of the problems I have: A lot of work has been done directly in the DEV branch rather than creating feature branches. Setting up a pull request from DEV to PROD wants to pull everything. Even in-progress or abandoned code changes. Some changes were made in the PROD branch directly, so I'll need to pull those changes back to DEV. We have valid changes in both DEV and PROD. I'm having trouble cherry-picking. It only lets me select one commit, then says I need to use command-line. It doesn't tell me the error. I don't know what tool to use for the command line. I've tried using Visual Studio, and I can pull in the Data Factory code, but have all the same problems there. I'm not looking for an answer to the questions, but how to find the answer to these questions. Is this Data Factory, or should I be looking at Devops? I'm having no trouble managing the database code or Power BI in Devops, but I created that fresh. Thanks for any help!Solved106Views0likes3CommentsAzure Data Factory Mapping Dataflow Key Pair Authenticiation Snowflake
Dear Microsoft, As Snowflake announced that they will remove the basic authentication (username + passwort) on September 2025, I wanted to change my Authentication Method in a mapping dataflow in Azure Data Factory. I got a Error Message and found out, that only basic authentication is allowed in the mapping dataflow: Copy and transform data in Snowflake V2 - Azure Data Factory & Azure Synapse | Microsoft Learn Is it going to be fixed in ADF in near future? Or is my Process in September broken?28Views0likes0CommentsLinux Support for Self-Hosted Integration Runtimes (SHIR)
Hi. Azure Support asked me to request this here. We would very much like to run self-hosted integration runtimes (SHIRs) on Linux instead of Windows. Currently we run them ACI and they take almost 10 minutes to start. They are also a bit klunky and difficult to manage on ACI, we would much rather run them in our AKS cluster alongside all our other Linux containers. Is Linux container support for SHIRs on the roadmap, and if not, can it be? Regards, Tim.22Views0likes0CommentsAlter Row Ignoring its Conditions
Hello. I have an ADF Dataflow which has two sources, a blob container with JSON files and an Azure SQL table. The sink is the same SQL table as the SQL source, the idea being to conditionally insert new rows, update rows with a later modified date in the JSON source or do nothing if the ID exists in the SQL table with the same modified date. In the Dataflow I join the rows on id, which is unique in both sources, and then use an Alter row action to insert if the id column from the SQL source is null, update if it's not null but the last updated timestamp in the JSON source is newer, or delete if the last updated timestamp in the JSON source is the same or older (delete is not permitted in the sink settings so that should ignore/do nothing). The problem I'm having is I get a primary key violation error when running the Dataflow as it's trying to insert rows that already exist: For example in my run history (160806 is the minimum value for ID in the SQL database): So for troubleshooting I put a filter directly after each source for that ticket ID so when I'm debugging I only see that single row. Now here is the configuration of my Alter row action: It should insert only if the SQLTickets id column is null, but here in the data preview from the same Alter rows action. It's marked as an insert, despite the id column from both sources clearly having a value: However, when I do a data preview in the expression builder itself, it correctly evaluates to false: I'm so confused. I've used this technique in other Dataflows without any issues so I really have no idea what's going on here. I've been troubleshooting it for days without any result. I've even tried putting a filter after the Alter row action to explicitly filter out rows where the SQL id column is not null and the timestamps are the same. The data preview shows them filtered out but yet it still tries to insert the rows it should be ignoring or updating anyway when I do a test run. What am I doing wrong here?41Views0likes0CommentsDynamically executing a child pipeline using a single Execute Pipeline activity with a variable
Goal: Create a master pipeline that: Retrieves metadata using a lookup. Calculates a value (caseValue) from the lookup result. Maps the value (caseValue) to a pipeline name using a JSON string (pipelineMappingJson). Sets the pipeline name (pipelineName) dynamically. Runs the correct child pipeline using the pipelineName variable. Question: Can the Execute Pipeline activity be updated to handle dynamic child pipeline names?71Views0likes3CommentsADX data receiving stop after sometime
I have very strange problem. I have IOT application where device send data to IOT hub, then it routed to event hub. There is function with trigger as azure function ,this function insert data in to ADX . I trace the event from device to event hub. I can see all data I can also see function get trigger ,no error.But in ADX empty record, no data from event. Just date field which add explicitly. Note- Again after some time I can see data in ADX (no change in ADX not even restart any service). Can any body have clue what exactly could be issue?24Views0likes0CommentsIP whitelist for Synapse Spark Pools to reach external endpoint with firewall?
I am trying to reach an external vendor SFTP site from my Synapse Spark notebook. The site is behind a firewall. I want to get the IP range for all of our Spark pools to the vendor so they can whitelist them. Struggling to get a clear idea of that list. Closest I found so far was "Azure Cloud East US", which is rather broad. Any advice/ideas how to get a refined list/range of IPs?27Views0likes1CommentADF Google Ads Linked Service using Service Authentication as the Authentication Type
We are trying to access Google Ads data using a Google Ads Service Account, and the ADF Google Ads Linked Service. We have set the linked service "Authentication type" to be "Service authentication". We generated a private key for this Service Account in Google, and we have used the key as the value in the "Private key" field of the linked service. We have populated the other required linked service fields (Name, Client customer ID, Developer token, Email), and also the optional "Login customer ID" field. We have also designated the linked service to use a self-hosted integration runtime instead of the AutoResolveIntegrationRuntime. When testing the connection, we are receiving this error message: Test connection operation failed. Failed to open the database connection. Fail to read from Google Ads. Parameter was empty Parameter name: pkcs8PrivateKey Does anyone in the Tech Community use this new version of the Google Ads linked service with the Authentication Type set to "Service authentication" instead of "User authentication"? Does anyone have any insight about the error message we are receiving?36Views0likes0CommentsAzure Data Factory GraphQL
I am trying to make a call to an API that uses GraphQL using ADF. I have 2 steps in the API. The first is to login and receive an authentication token. So, obviously the auth token is dynamic and needs to be received with each call of the API. The next step is to use that token to then call the GraphQL query. I can get it to work just fine in Postman, however I can't get it to work in ADF. I have tried using { "query": "query { <contents here> }" } This does not work and I get an error saying my JSON is not formatted correctly. Any suggestions on how to get this to work in ADF?54Views0likes1CommentWhat is the way to use OUTPUT parameter for an Oracle Stored procedure in ADF pipelines?
I have a oracle database package and i am trying to call a stored procedure inside that package. The procedure has a OUT parameter which we want to use in the activities further in ADF pipelines. But ADF pipelines does not have a way to get the OUT parameter values and use it in pipeline. This is a very important feature.260Views0likes5CommentsOData Connector for Dynamics Business Central
Hey Guys, I'm trying to connect Dynamics Business Central OData API in ADF but I'm not sure what I'm doing wrong here because the same Endpoint is returning data on Postman but returning an error in ADF LinkedService. https://api.businesscentral.dynamics.com/v2.0/{tenant-id}/Sandbox-UAT/ODataV4/Company('company-name')/Chart_of_Accounts38Views0likes1CommentGraphQL API call in Azure Data Factory
I am trying to call a GraphQL API in Data Factory. The API requires 2 steps. The first is to login and retrieve the Authentication token, and then the second is to use that token to call the GraphQL endpoint and pass in the query in the body. Is ADF capable of doing so? I have used a web activity and can successfully authenticate, but then I get an error saying that my JSON is not formatted correctly. I have passed it in as { "query" : "{GraphQL query here}"} From what I've seen, ADF doesn't do well with GraphQL, but are there workarounds?28Views0likes0CommentsNeed help with ADF pipeline
I am new to Azure Data Factory (ADF) and have an urgent client requirement to create a pipeline. The data source is SharePoint, which contains multiple folders and subfolders. Since there is no direct SharePoint connector in ADF, what would be the best approach to read these files31Views0likes1CommentExternal Table in ADX
Hi, I'm trying to create an external table in ADX which uses a Synapse Analytics (SA) database view (called undelivered). The undelivered view itself is query data from a Cosmos analytical store I've create a user defined idenity Added the identiy to the ADX cluster, SA and Cosmos Updated the ADX database: .alter-merge cluster policy managed_identity[ { "ObjectId": "a3d7ddcd-d625-4715-be6f-c099c56e1567", "AllowedUsages": "ExternalTable" } ] Created the database users in SA -- Create a database user for the ADX Managed Identity CREATE USER [adx-synapse-identity] FROM EXTERNAL PROVIDER; -- Grant read permissions ALTER ROLE db_datareader ADD MEMBER [adx-synapse-identity]; GRANT SELECT ON OBJECT::undelivered TO [adx-synapse-identity]; From within SA I can "SELECT * FROM undelivered" and the correct information is returned But when I come to create the external table in ADX: .create-or-alter external table MyExternalTable ( Status: string ) kind=sql table=undelivered ( h@'Server=tcp:synapse-xxxxx.sql.azuresynapse.net,1433;Database="Registration";ManagedIdentityClientId=<key>;Authentication=Active Directory Managed Identity;' ) with ( managed_identity = "<key>" ) I get the error: Managed Identity 'system' is not allowed by the managed_identity policy for usage: ExternalTable So even with me specifying the managed identity I want to use it is still trying to use the system one. How can I get the external table created with the correct managed identity? Any questions please just ask Thanks46Views0likes0Comments
Events
Recent Blogs
- Introduction AI agents are revolutionizing how applications interact with data by combining large language models (LLMs) with external tools and databases. This blog will show you how to combine Az...Mar 13, 2025771Views1like0Comments
- 3 MIN READA few days ago, I was working on a case where a customer reported an unexpected behavior in their application: even after switching the connection policy from Proxy to Redirect, the connections were ...Mar 12, 2025167Views0likes0Comments