data & ai
158 TopicsDeploying DNS Private Resolvers and Private DNS Zones for Azure AI Supported Services
Private Networks: Private DNS Zones: Resolves domain names to private IPs within Azure virtual networks without exposing them to the internet. Private DNS Zones are global, you don’t need to create multiple same private DNS Zones, you can reuse the same zones as it’s global DNS Private Resolvers: Fully managed service that enables DNS resolution between Azure VNets and on-premises networks without custom DNS servers. DNS Private resolvers are regional, which means if you have Azure EAST US and WEST US 2 regions, you need to create DNS Private resolvers in both regions linked to Private DNS Zones, you can adopt centralized or distributed DNS Private resolvers, I will discuss both options later in this article Public Networks: <In this part – not focusing on Public Networks> Public DNS Zones: Resolves internet-facing domain names to publicly accessible IP addresses Traffic Managers: DNS-based traffic load balancer that routes client requests to the best available global endpoint DNS Security Policy: Controls and protects DNS resolution behavior (e.g., filtering, forwarding, and access rules) to secure name resolution and prevent misuse **Note: 1. Follow Prerequisites to deploy resources. 2. A common misconception is that VNet peering enables DNS resolution. In reality, private DNS zones are only accessible to VNets that are explicitly linked to them, peering provides connectivity, but not name resolution. In the following snapshot à Azure Portal à Network Foundations à DNS, lets explore individual DNS Services offered and later in this document, we will interconnect **Credits to Microsoft Azure Portal Design team for creating new grouped views – you can check out for more – like compute infrastructure, Hybrid, Backup Now, let’s delve into scenario 01: I have grabbed the following snapshot from Azure AI Landing Zones and removed non-network Azure resources to focus only on private Network components, **Credits to AI Landing Zone team for the diagram, Original Version: Inbound Zoom in view with End-to-End Flow Hop Summary 1 Client initiates request 2 DNS query sent to on-prem DNS 3 DNS query forwarded to Azure 4 Azure DNS Resolver processes query 5 Private DNS resolves to Private Endpoint IP 6 Traffic routed via VNet peering 7 Traffic hits Private Endpoint 8 Request served by Azure Files *Link Private DNS to DNS resolvers in other regions, Private DNS is GLOBAL and DNS Resolvers are regional Example Snapshot of entire flow: Nslookup from Client machine, Domain – DNS Conditional Forwarder configuration Note 1: Make sure you selected “All DNS Servers in this forest” for replication, otherwise users pointed to some other domain will be unable to resolve Verifying Connectivity with PsPing <credit to Sysinternals team PsPing > PsPing, a tool from Sysinternals, is highly effective for verifying network connectivity from on-premises environments to Azure resources on specific ports. This is particularly useful when you need to ensure connectivity to ports such as 445, 443, 1433, 1521, or any other port required by Azure services you intend to access from either on-premises locations or other cloud environments. By using PsPing, you can test and confirm that the necessary ports are open and accessible, which is crucial for troubleshooting connectivity issues and ensuring smooth communication between your on-premises infrastructure and Azure-hosted resources. Ensure your firewall is set to allow traffic DNS private resolvers – inbound configuration Private DNS Configuration Virtual Network links enable to your private dns Make sure you have peer between hub and spoke Private Endpoint configuration Storage Account configuration “Replace the file share with any supported Azure service that uses Private Endpoints, and follow the same guidance.” 2. Outbound <flow and resources colored with blue> part 2 upcoming soon376Views0likes0CommentsGrounding LLMs
I recently gave a talk at a Microsoft-internal event on everything I learned (so far) about grounding LLMs with Retrieval Augmented Generation and other techniques to get the to generate output that is accurate, reliable, and relevant. I am sharing it here in article form (masterfully produced by GPT-4 from the transcript and slides of the talk). Hope you find some of this useful as you start building solutions with LLMs.176KViews51likes15CommentsLeverage Copy Data Parallelism with Dynamic Partitions in ADF/Synapse Metadata-driven Pipelines
Follow this Azure Data Factory/Synapse Analytics pipeline pattern to take advantage of parallel Copy Data activity by partition, even when you don’t have partitioned source data, all within a metadata-driven pipeline!27KViews6likes3CommentsMicrosoft Fabric - Multi-Tenant Architecture
Fabric Multi-Tenant Architecture (updated version - August 24) Organization often faces challenges in managing data for multiple tenants in a secure manner while keeping costs low. Traditional solutions may prove costly for scenarios with more than 100 tenants, especially with the common ISV scenario where the volume of trial and free tenants is much larger than the volume of paying tenants. The motivation for ISVs to use Fabric is that it brings together experiences such as Data Engineering, Data Factory, Data Science, Data Warehouse, Real-Time Analytics, and Power BI onto a shared SaaS foundation. In this article, we will explore the Workspace per tenant-based architecture, which is a cost-effective solution for managing data for all tenants in Microsoft Fabric, including ETL and reporting.24KViews5likes0CommentsImplementing Pagination with the Copy Activity in Microsoft Fabric
Introduction: APIs often return a large amount of data. Pagination allows you to retrieve a manageable subset of this data at a time, preventing unnecessary strain on resources and reducing the likelihood of timeouts or errors. In this example, the client starts by requesting the first page of data from the server. The server responds with both the data and metadata indicating the current page, the total number of records. The client then proceeds to request subsequent pages of data until it reaches the last page. This approach allows for efficient data retrieval and processing without overwhelming the client or the server. we want to get a file in ADLS containing all data from API without the need to use other activities like until/forEach, we want the Copy activity to perform all the pagination needed to collect all the data. Prerequisites: 1. Basic knowledge in Rest API. 2. Workspace in Microsoft Fabric. 3. ADLS storage account. API used: https://pokeapi.co/api/v2/pokemon default limit according to pokeapi documentation is 20 records per request. In this tutorial, i want to limit records up to 500 records per request like so: https://pokeapi.co/api/v2/pokemon?limit=500&offset=501 The initial API call will be made using the following URL: pokeapi.co/api/v2/pokemon?limit=500&offset=0 Subsequently, the API calls will proceed as follows: Second call: pokeapi.co/api/v2/pokemon?limit=500&offset=501 Third call: pokeapi.co/api/v2/pokemon?limit=500&offset=1002 In each successive request, the offset value will be incremented by 500 to retrieve the next set of records. Steps: Step1: Prepare your workspace. In your fabric workspace, navigate to Data Factory component and add a pipeline to your workspace, after that drag a copy activity to your canvas. Follow steps mentioned in MS documentation: Module 1 - Create a pipeline with Data Factory - Microsoft Fabric | Microsoft Learn Step2: Configure the Copy activity. 1. Source settings: data store type: External connection: add new -> click on Rest connection. Fill connection settings like so: click on Create. Relative URL: pokemon?limit=500&offset=pageOffset here I'm adding a value to the offset parameter, pageOffset is a variable that will get value from the pagination rule. In Advanced: under Pagination Rule, add a value to the variable 'pageOffset' which indicates that we will run from 1 to 1281 with an offset of 500, so each call to the API we will jump by 500 records as mentioned above. 2. destination I would like to write data as a .csv file,named outputPartitioning.csv , i added my ADLS connection to my lake house, follow the steps in MS documentation: Create an Azure Data Lake Storage Gen2 shortcut - Microsoft Fabric | Microsoft Learn 3. Mapping tab: After we configured both source and destination, now we need to map our data, so data from the API comes as a Json with these attributes: { "count": 1281, "next": "https://pokeapi.co/api/v2/pokemon?offset=3&limit=2", "previous": "https://pokeapi.co/api/v2/pokemon?offset=0&limit=1", "results": [ { "name": "ivysaur", "url": "https://pokeapi.co/api/v2/pokemon/2/" }, { "name": "venusaur", "url": "https://pokeapi.co/api/v2/pokemon/3/" } ] } now we don't care about metadata provided by the API (like count,next,previous keys), we only want the results array. click on import schemas, after that add in collection reference: $['results'] delete extra results that you see below, and make sure to make name and url keys is saved as String in destination like so: Step3: Run copy activity. after running copy activity, you should see output file in your ADLS storage account and activity marked as success in the Fabric workspace. Output: downloaded my file from my ADLS storage account and opened it in Visual Studio. we can see that we got 1281 records as promised from the API, so pagination worked. Links: - Module 1 - Create a pipeline with Data Factory - Microsoft Fabric | Microsoft Learn - Create an Azure Data Lake Storage Gen2 shortcut - Microsoft Fabric | Microsoft Learn - Documentation - PokéAPI (pokeapi.co) - How to configure REST in a copy activity - Microsoft Fabric | Microsoft Learn Call-To-Action: - Make sure to establish all connections in ADLS and in Fabric workspace. - check MS documentation on pagination in copy activity. - Please help us improve by sharing your valuable feedback. - Follow me on LinkedIn: Sally Dabbah | LinkedIn9.6KViews1like3Comments