azure sql
57 TopicsReal‑World Cloud & Azure SQL Database Examples Using Kepner‑Tregoe
The Kepner‑Tregoe (KT) methodology is especially effective in modern cloud environments like Azure SQL Database, where incidents are often multi‑dimensional, time‑bound, and affected by asynchronous and self‑healing behaviors. Below are practical examples illustrating how KT can be applied in real Azure SQL scenarios. Example 1: Azure SQL Geo‑Replication Lag Observed on Read‑Only Replica Scenario An application team reports that changes committed on the primary Azure SQL Database are not visible on the geo‑replica used for reporting for up to 30–40 minutes. The primary database performance remains healthy. Applying KT – Problem Analysis What is happening? Read‑only geo‑replica is temporarily behind the primary. What is not happening? No primary outage, no data corruption, no failover. Where does it occur? Only on the geo‑secondary, during specific time windows. When does it occur? Repeatedly around the same time each hour. What is the extent? Lag spikes, then returns to zero. KT Insight By separating data visibility delay from primary health, teams avoid misdiagnosing the issue as a platform outage. Public DMVs (such as sys.dm_geo_replication_link_status and sys.dm_database_replica_states) confirm this as a transient redo lag scenario, not a service availability issue. Example 2: Error 3947 – Transaction Aborted Due to HA Replica Redo Lag Scenario Applications intermittently hit error 3947 (“The transaction was aborted because the secondary failed to catch up redo”), while primary latency remains stable. Applying KT – Situation Appraisal What needs immediate action? Ensure application retry logic is functioning. What can wait? Deep analysis—since workload resumes normally after retries. What should not be escalated prematurely? Platform failover or data integrity concerns. KT Insight KT helps distinguish protective platform behavior from defects. Error 3947 is a deliberate safeguard in synchronous HA models to maintain consistency—not an outage or bug. Example 3: Performance Degradation During Business‑Critical Reporting Scenario Customer reports slow reporting queries on a readable secondary during peak hours, coinciding with replication lag spikes. Applying KT – Decision Analysis Possible actions: Route reporting queries back to primary during spike window Scale up replica resources Move batch processing off peak hours KT Decision Framework Musts: No data inconsistency, minimal user impact Wants: Low cost, fast mitigation, minimal architecture change Decision Temporarily route latency‑sensitive reads to the primary while continuing investigation. This decision is defensible, documented, and reversible. Example 4: Preventing Recurrence with Potential Problem Analysis Scenario Recurring redo lag spikes happen daily at the same minute past the hour. Applying KT – Potential Problem Analysis What could go wrong? Hourly batch job may generate large log bursts How likely is it? High (pattern repeats daily) What is the impact? Temporary stale reads on replicas Preventive actions: Break batch jobs into smaller units Shift non‑critical workloads outside reporting hours Monitor redo queue size proactively KT Insight Rather than responding reactively each day, teams use KT to anticipate and reduce the likelihood and impact of recurrence. Example 5: Coordinated Incident Management Across Regions Scenario An Azure SQL issue spans EMEA, APAC, and US support teams, with intermittent symptoms and high stakeholder visibility. Applying KT – Situation Appraisal KT helps teams: Prioritize which signals are critical vs. noise Decide when to involve engineering vs. continue monitoring Communicate clearly with customers using facts, not assumptions This prevents “analysis paralysis” or conflicting interpretations across time zones. Why KT Works Well in Cloud and Azure SQL Environments Cloud platforms contain self‑healing, asynchronous behaviors that can be misinterpreted Multiple metrics may conflict without structured reasoning KT brings discipline, shared language, and defensible conclusions It complements tooling (DMVs, metrics, alerts)—it doesn’t replace them Closing Thought In cloud operations, how you think is as important as what you observe. Kepner‑Tregoe provides a timeless, structured way to reason about complex Azure SQL Database behaviors—helping teams respond faster, communicate better, and avoid unnecessary escalations.104Views0likes0CommentsGeo‑Replication Redo Lag in Azure SQL Database
Monitoring and Troubleshooting Using Public DMVs Azure SQL Database provides built‑in high availability and geo‑replication capabilities to ensure database resilience and business continuity. While replication is fully managed by the platform, customers may occasionally observe a delay between the primary and geo‑replicated secondary database, especially during periods of heavy write activity. This article provides a public, supported approach to monitoring and understanding geo‑replication delay caused by redo lag, using official Dynamic Management Views (DMVs) and T‑SQL only, without relying on internal tools. Scenario Overview Customers may report that changes committed on the primary database are not immediately visible on the geo‑replicated secondary, sometimes for several minutes. Typical symptoms include: Reporting queries on the geo‑secondary showing stale data Increased redo catch‑up time on the secondary No performance degradation observed on the primary Replication eventually catches up without manual intervention This behavior is commonly associated with redo lag, where the secondary has already received the log records but requires additional time to replay them into data pages. Understanding Geo‑Replication Redo Lag In Azure SQL Database geo‑replication: Transactions are sent from the primary to the secondary Log records are hardened on the secondary The secondary applies these records asynchronously to its data files (redo) If the secondary experiences temporary pressure (for example, due to schema changes or intensive operations), redo may fall behind, causing the secondary to lag—even though data durability remains intact. Step 1: Check Geo‑Replication Status and Lag The primary DMV for monitoring geo‑replication is: 📘 sys.dm_geo_replication_link_status Public documentation: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database Sample T‑SQL Query SELECT database_id, partner_server, partner_database, replication_state_desc, replication_lag_sec, last_replication_time FROM sys.dm_geo_replication_link_status; Key Columns Explained Column Description replication_state_desc Current replication state replication_lag_sec Estimated lag (in seconds) last_replication_time Last successful replication timestamp (UTC) partner_server Geo‑replica logical server Interpretation Healthy replication: replication_lag_sec = 0 and state is healthy Transient delay: Lag increases temporarily but later returns to zero Sustained delay: Lag remains elevated for an extended period and should be investigated further Step 2: Monitor Local Replica Redo Health To understand whether lag is related to redo activity on the secondary, you can query: 📘 sys.dm_database_replica_states Public documentation: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database Sample T‑SQL Query SELECT database_id, role_desc, synchronization_state_desc, synchronization_health_desc, redo_queue_size, redo_rate, last_commit_time from sys.dm_database_replica_states Key Insights redo_queue_size Indicates how much log data is pending replay on the secondary redo_rate Shows how quickly redo is being applied last_commit_time Helps estimate data freshness on the secondary (UTC) Interpretation Scenario Observation Normal operation redo_queue_size = 0 Write burst redo_queue_size increases temporarily Recovery in progress redo_rate remains positive Healthy state synchronization_health_desc = HEALTHY Short‑lived spikes are expected platform behavior and usually resolve automatically. Practical Monitoring Considerations Reporting Workloads If applications read from the geo‑secondary for reporting: Expect near‑real‑time, not guaranteed real‑time visibility Design reports to tolerate small delays Route latency‑sensitive reads to the primary if required Transaction Patterns Redo lag is more likely during: Large batch updates Index maintenance operations Schema modification commands Bursty write workloads Keeping transactions short and efficient reduces replication pressure. Best Practices Use UTC timestamps consistently when correlating events Monitor replication_lag_sec and redo_queue_size together Implement retry logic in applications for transient conditions Avoid assuming read replicas are always perfectly synchronized Do not take manual actions during short‑lived redo spikes unless the lag persists Summary Redo lag in Azure SQL Database geo‑replication is a normal and self‑healing behavior during certain workload patterns. By using supported public DMVs and T‑SQL, customers can: Monitor replication health safely Understand replica freshness Make informed application routing decisions Avoid unnecessary intervention Azure SQL Database automatically stabilizes replication once redo pressure subsides, ensuring durability and availability without manual management. References Azure SQL Database High Availability https://learn.microsoft.com/azure/azure-sql/database/high-availability-overview sys.dm_geo_replication_link_status https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database sys.dm_database_replica_states https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-databaseLesson Learned #521: Query Performance Regression with Multiple Execution Plans in Azure SQL
Some days ago, we were working on a service request where our customer asked why a query had degraded in performance. One possible issue could be that more than one execution plan is being used for a specific query. So I would like to share the steps we followed using QDS with DMVs. First, we executed this query to identify any queries that had more than one plan_id, which is often a sign that the optimizer has compiled multiple strategies to run the same query: SELECT q.query_id, qt.query_sql_text, q.query_hash, COUNT(DISTINCT p.plan_id) AS num_plans, STRING_AGG(CAST(p.plan_id AS VARCHAR), ', ') AS plan_ids FROM sys.query_store_query_text qt JOIN sys.query_store_query q ON qt.query_text_id = q.query_text_id JOIN sys.query_store_plan p ON q.query_id = p.query_id GROUP BY q.query_id, qt.query_sql_text, q.query_hash HAVING COUNT(DISTINCT p.plan_id) > 1 ORDER BY num_plans DESC; We got a list of queries and after some analysis, we found the one the customer was referring to. The query in question was a simple aggregate with a parameter: (@N int)SELECT count(Name),name FROM Notes where ID<@n group by Name As we found that they query has two plans, we executed the following TSQL to obtain the details of the executions. SELECT rs.execution_type_desc, rs.avg_duration / 1000 AS avg_duration_ms, rs.avg_cpu_time / 1000 AS avg_cpu_ms, rs.last_duration / 1000 AS last_duration_ms, rs.count_executions, rs.first_execution_time, rs.last_execution_time, p.plan_id, p.is_forced_plan, TRY_CONVERT(XML, p.query_plan) AS execution_plan_xml FROM sys.query_store_runtime_stats rs JOIN sys.query_store_plan p ON rs.plan_id = p.plan_id WHERE p.query_id = 2 ORDER BY rs.last_execution_time DESC; We got the following results: We could see the execution plan number 2 was executed less time but taking more time in average. Checking the execution plan XML we were able to identify an automatic update statistics was executed causing a new execution plan. Trying to give insights about possible causes, we wrote the following TSQL giving us when the statistics were updated directly from the execution plan XML. ;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') SELECT p.plan_id, stat.value('@Statistics', 'VARCHAR(200)') AS stats_name, stat.value('@LastUpdate', 'DATETIME') AS stats_last_updated, stat.value('@SamplingPercent', 'FLOAT') AS stats_sampling_percent FROM sys.query_store_plan AS p CROSS APPLY ( SELECT CAST(p.query_plan AS XML) AS xml_plan ) AS x OUTER APPLY x.xml_plan.nodes(' /ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/OptimizerStatsUsage/StatisticsInfo' ) AS t(stat) WHERE p.query_id = 2; Well, we found another way to query directly the execution plan and include other information from Query Data Store. Enjoy!228Views0likes0CommentsPrinciple 'XYZ' could not be found or this principal type is not supported - Azure SQL DB and MI
In this blog article, we will be discussing the possible scenarios to get the error "Principle 'XYZ' could not be found or this principal type is not supported" while adding an AAD user or group to your Azure SQL database or Azure SQL managed instance.80KViews4likes8CommentsLesson Learned #517:Connection Timeouts to Azure SQL Database Using Private Endpoint with HikariCP
Recently, we have been working on a support case where our customer reported the following error message: ERROR com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Exception during pool initialization. com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host myserverX.database.windows.net, port 1433 has failed. Error: 'Connection timed out: no further information. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.' I would like to share the lessons learned during the troubleshooting and resolution process Initially, what caught our attention were the first two messages: Exception during pool initialization and Connection Timeout out. This indicates that Hikari was unable to create the connection pool during the initialization process (one of the first steps in connection pooling) due to a connection timeout. Therefore, we began working on connectivity options and started investigating, asking if our customer is using public endpoint or private endpoint, they confirmed the private endpoint usage. The following factors could be identified as possible causes: Private Endpoint Misconfiguration: The Private Endpoint was not correctly associated with the Azure SQL Database. DNS Resolution Issues: The database hostname was not resolving to the Private Endpoint's private IP address. Network Security Group (NSG) Restrictions: The NSG attached to the subnet hosting the Private Endpoint was blocking inbound traffic on port 1433. Firewall Settings on Azure SQL Database: Firewall rules were not allowing connections from the source network. Redirect Mode Configuration: Additional ports required for redirect mode were blocked or misconfigured. Our troubleshooting steps started running using nslookup myserverX.database.windows.net to ensure that the database hostname resolves to the Private Endpoint's private IP address. If the IP is public instead of private, we verify the DNS configuration or use Azure DNS. Our second step was to validate the Java Application host can reach Azure SQL Database on port 1433 or the redirect port (if our customer use redirect connection policy) using the command telnet myserverX.database.windows.net 1433 or the Linux command nc -vz myserverX.database.windows.net 1433 and we identified the connections fails. Check this Azure SQL Database and Azure Synapse Analytics connectivity architecture and Azure Private Link - Azure SQL Database & Azure Synapse Analytics | Microsoft Learn for more details. Following we verify the Network Security Group attached to the subnet if allows outbound traffic to the port 1433 and we found that the NSG rule doesn't exist for private endpoint. Once the rule was added the Java Application was able to connect.580Views0likes0CommentsAdditional Endpoints Required for AAD Authentication and CRL Checks for Azure SQL DB
You may have attempts to connect to Azure SQL Database with an Azure Active Directory (AAD) account that are failing with a timeout error, but SQL Authentication works as expected. Alternatively you could be encountering generic connection errors with the inner exception being ‘Revocation of the SSL certificate failed’. If the following workarounds are not sufficient, you may have to whitelist specific endpoints on your Firewalls or network infrastructure: Disabling the Revocation of the SSL Certificate Check: Revocation of the SSL certificate failed for AAD authentication - Microsoft Tech Community Allowing all SSL traffic from that server on their firewall An example of the ‘Revocation of the SSL certificate failed’ error in SQL Server Management Studio (SSMS): You can take a network trace in order to see where traffic is being blocked, or to grab the certificate that is being passed back. Having the certificate will allow you to examine the CRL endpoints listed in the certificate itself. Instead of taking a network trace, you may want to watch blocked traffic on your Firewall and whitelist trusted endpoints that appear in order to expedite troubleshooting. You can test connectivity to ports via PowerShell using, for example, “tnc login.windows.net -port 443”. If you have an ADFS setup, then to authenticate you will need to whitelist your ADFS endpoint on port 443. If you have multiple ADFS endpoints you will need to whitelist the one that your DNS server is resolving you to. This is a non-exhaustive list of endpoints that may be required depending on the authentication type. Additional endpoints may be required to communicate with Active Directory Federation Services (ADFS), or depending on your networking path and if you are using a public Certification Authority (CA), endpoints for whoever has issued your certificate for ADFS. Please note not all Firewalls can be configured to accept wildcards, and while the below list has as many explicitly defined URLs as possible, you may need to see what traffic is being blocked on the Firewall to get definite URLs for those not fully defined here. This is also how you can determine if something not on this list is being blocked and is required for your particular networking setup (perhaps the ADFS endpoint your machine is reaching out to is not the one you expect, etc). URL Port Description mscrl.microsoft.com HTTP/80 Used to download CRL lists. *.verisign.com HTTP/80 Used to download CRL lists. *.entrust.net HTTP/80 Used to download CRL lists for MFA. *.management.core.windows.net (Azure Storage) *.graph.windows.net (Azure AD Graph) HTTPS/443 Used for the various Azure services secure.aadcdn.microsoftonline-p.com HTTPS/443 Used for MFA. *.microsoftonline.com HTTPS/443 Used to configure your Azure AD directory and import/export data. login.microsoftonline.us HTTPS/443 Used by US Gov for AD Login. login.microsoftonline.com HTTPS/443 Used by Public cloud for AD login for MFA. login.windows.net HTTPS/443 Used by Public cloud for AD login for Password and Integrated. http://crl.microsoft.com HTTP/80 Used to verify certificates. http://crl3.digicert.com HTTP/80 Used to verify certificates. http://crl4.digicert.com HTTP/80 Used to verify certificates. http://ocsp.digicert.com HTTP/80 Used to verify certificates. http://www.d-trust.net HTTP/80 Used to verify certificates. http://root-c3-ca2-2009.ocsp.d-trust.net HTTP/80 Used to verify certificates. http://crl.microsoft.com HTTP/80 Used to verify certificates. http://oneocsp.microsoft.com HTTP/80 Used to verify certificates. http://ocsp.msocsp.com HTTP/80 Used to verify certificates. http://www.microsoft.com/pkiops HTTP/80 Used to verify certificates. cacerts.digicert.com HTTP/80 Used to verify certificates. ctldl.windowsupdate.com HTTP/80 Used to verify certificates. References Azure Active Directory certificate authorities | Microsoft Docs Azure TLS Certificate Changes | Microsoft Docs Azure AD Connect: Troubleshoot Azure AD connectivity issues | Microsoft Docs Revocation of the SSL certificate failed for AAD authentication - Microsoft Tech Community Microsoft 365 endpoints - Microsoft 365 Enterprise | Microsoft Docs7.9KViews0likes1CommentLesson Learned #383: Loading multiple extended file using sys.fn_xe_file_target_read_file
In today's technologically advanced world, data plays a crucial role in driving decision-making processes and gaining valuable insights. One of the popular methods for capturing and analyzing data in Microsoft SQL Server is through Extended Events (XE). XE allows administrators to collect information about various events occurring within the SQL Server environment. However, managing and loading multiple extended event files can be a challenging task. In this article, we will explore an effective solution using the sys.fn_xe_file_target_read_file function to load multiple extended event files seamlessly.3.1KViews0likes1CommentAzure SQL DB Private Link / Private Endpoint - Connectivity Troubleshooting
In the article Azure SQL DB Connectivity Troubleshooting we have explored regular connectivity troubleshooting to Azure SQL DB, on this one we will explore connectivity troubleshooting using Private Link In this article we are going to explore What is the Private Endpoint for Azure DB? Creation of Private Endpoint Azure VM > Private Link OnPrem VM > VPN (P2S) > Private Link (Hosts File) OnPrem VM > VPN (P2S) > Private Link (Custom DNS) Azure Function > VNET integration > Private Endpoint Failover Groups with Private Link 1 - What is the Private Endpoint for Azure DB? Azure Private Link enables you to access Azure PaaS Services (for example, Azure Storage and SQL Database) and Azure hosted customer-owned/partner services over a private endpoint in your virtual network. Traffic between your virtual network and the service travels the Microsoft backbone network. Exposing your service to the public internet is no longer necessary Some important information Private Link service can be accessed from approved private endpoints in the same region. The private endpoint can be reached from the same virtual network, regionally peered VNets, globally peered VNets and on premises using private VPN or ExpressRoute connections. When creating a Private Link Service, a network interface is created for the lifecycle of the resource. This interface is not manageable by the customer. The Private Link Service must be deployed in the same region as the virtual network. A single Private Link Service can be accessed from multiple Private Endpoints belonging to different VNets, subscriptions and/or Active Directory tenants. The connection is established through a connection workflow. In the image, below we can see how the connection using private endpoint works. And as mentioned on the other article Azure SQL DB Connectivity Troubleshooting the Azure SQL DB clients by default will all go to the Shared public IP of the regional gateway. With the private endpoint you can close this public path and users can only connect from private endpoint. If you have multiple Azure VNETs you will need to use VNET peering or VNET VPN between two Azure VNETs, or P2S,S2S or Express Route to connect your onprem to Azure Network 2 - Creation of Private Endpoint To create a Private endpoint just follow up the procedure documented at https://docs.microsoft.com/en-us/azure/private-link/create-private-endpoint-portal You may also want to close all public network access to make sure all connection must flow from Private Endpoint. Just need to go to Azure SQL DB instance and "Firewall and Virtual Networks" and Deny public network access. Find below the procedure I've used to create a Private Endpoint Just need to go to "Private Endpoint Connections" and then add a Private endpoint Select the region that should be the same as the VNET region as mentioned above. Select the resource type "Microsoft.Sql/servers" for Azure SQL DB instance Select the Azure SQL DB instance you want to connect Select the VNET / Subnet. Notice also that during the creation you can already create a private DNS zone, that will work for Azure resources that uses the Azure DNS. We will talk more about that when doing the tests 3 - Azure VM > Private Endpoint From an Azure VM deployed to same VNET, if we test command below on command prompt before you create the Private Endpoint. nslookup fonsecanet-westeu.database.windows.net You will get result like below that shows that this server is using public gateway IP: 40.68.37.158. So NOT using any private IP Server: UnKnown Address: 168.63.129.16 Non-authoritative answer: Name: westeurope1-a.control.database.windows.net Address: 40.68.37.158 Aliases: fonsecanet-westeu.database.windows.net After you create the Private Endpoint using the same command above you are expected to see the results below Server: UnKnown Address: 168.63.129.16 Non-authoritative answer: Name: fonsecanet-westeu.privatelink.database.windows.net Address: 10.0.1.4 Aliases: fonsecanet-westeu.database.windows.net Be sure also to open outbound communication from Azure VM VNET to Private Endpoint on Local Firewall, Corporate Firewall, or Azure NSGs. For this test I've opened to allow all communication inside VNET. *Currently (Status on 2020-04-06) redirect is not supported, so only needed 1433 port You must use the FQDN to connect to Azure SQL DB as documented at https://docs.microsoft.com/en-us/azure/sql-database/sql-database-private-endpoint-overview#check-connectivity-using-sql-server-management-studio-ssms Use the Fully Qualified Domain Name (FQDN) of the server in connection strings for your clients. Any login attempts made directly to the IP address shall fail. This behavior is by design, since private endpoint routes traffic to the SQL Gateway in the region and the FQDN needs to be specified for logins to succeed. You can also check if connection is correct using TSQL below. And we will see the client IP is the private IP assigned to Azure VM. SELECT client_net_address FROM sys.dm_exec_connections where session_id = @@SPID You must use the FQDN to connect to Azure SQL DB. Azure SQL DB gateway use the name to route correctly your connection to the SQL host, when information is not provided it will fail If you try to connect using private endpoint IP you are going to get error like below =================================== Cannot connect to 10.0.1.4. =================================== A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 - The target principal name is incorrect.) (.Net SqlClient Data Provider) ------------------------------ Server Name: 10.0.1.4 Error Number: -2146893022 Severity: 20 State: 0 You should also NOT use DB.privatelink.database.windows.net =================================== Cannot connect to fonsecanet-westeu.privatelink.database.windows.net. =================================== A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 - The target principal name is incorrect.) (.Net SqlClient Data Provider) ------------------------------ Server Name: fonsecanet-westeu.privatelink.database.windows.net Error Number: -2146893022 Severity: 20 State: 0 4 - OnPrem VM > VPN (P2S) > Private Link (Hosts File) You can use some template like below to create the VPN Point to Site https://docs.microsoft.com/en-us/azure/sql-database/sql-database-managed-instance-configure-p2s https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-howto-point-to-site-resource-manager-portal After creating the VPN, downloading and installing the VPN client and connecting to it If I check name resolution nslookup fonsecanet-westeu.database.windows.net We can see that it is still using Microsoft internal Corporate DNS where this VM is located Server: XXXXXXXXXXX.corp.microsoft.com Address: 10.221.xx.xx Non-authoritative answer: Name: westeurope1-a.control.database.windows.net Address: 40.68.37.158 Aliases: fonsecanet-westeu.database.windows.net fonsecanet-westeu.privatelink.database.windows.net For this scenario you will need to use your corporate DNS to have the name resolution https://docs.microsoft.com/en-us/azure/private-link/private-endpoint-overview#dns-configuration When connecting to a private link resource using a fully qualified domain name (FQDN) as part of the connection string, it's important to correctly configure your DNS settings to resolve to the allocated private IP address. Existing Azure services might already have a DNS configuration to use when connecting over a public endpoint. This needs to be overridden to connect using your private endpoint. The network interface associated with the private endpoint contains the complete set of information required to configure your DNS, including FQDN and private IP addresses allocated for a given private link resource. You can use the following options to configure your DNS settings for private endpoints: Use the Host file (only recommended for testing). You can use the host file on a virtual machine to override the DNS. Use a private DNS zone. You can use private DNS zones to override the DNS resolution for a given private endpoint. A private DNS zone can be linked to your virtual network to resolve specific domains. Use your custom DNS server. You can use your own DNS server to override the DNS resolution for a given private link resource. If your DNS server is hosted on a virtual network, you can create a DNS forwarding rule to use a private DNS zone to simplify the configuration for all private link resources. Important It's not recommended to override a zone that is actively in use to resolve public endpoints. Connections to resources won't be able to resolve correctly without DNS forwarding to the public DNS. To avoid issues, create a different domain name or follow the suggested name for each service below. Private Link resource type Subresource Zone name SQL DB (Microsoft.Sql/servers) Sql Server (sqlServer) privatelink.database.windows.net Azure will create a canonical name DNS record (CNAME) on the public DNS to redirect the resolution to the suggested domain names. You'll be able to override the resolution with the private IP address of your private endpoints. Your applications don't need to change the connection URL. When attempting to resolve using a public DNS, the DNS server will now resolve to your private endpoints. The process does not impact your applications. For this test will use HOST file solution And we are able to connect fine. And as before you will be able to check you get the VPN ip range using TSQL below SELECT client_net_address FROM sys.dm_exec_connections where session_id = @@SPID 5 - OnPrem VM > VPN (P2S) > Private Link (Custom DNS) If you do not want to use local hosts file as mentioned above you can add Azure DNS (168.63.129.16) as a forwarder on your custom DNS For more information about more complex DNS relationships follow up the official documentation https://docs.microsoft.com/en-us/azure/private-link/private-endpoint-dns On Azure Virtual Network settings I've replaced default Azure DNS with your custom DNS (On my tests 10.0.0.5). You can also add 168.63.129.16 (Azure DNS) as secondary DNS just in case 1st is off If your client lives in Azure, the DNS on VM can be a forwarder to Azure DNS - Go to DNS > Domain > Properties > Forwarders - Add Azure DNS (168.63.129.16) Do not forget to add virtual network link from Private Zone to the VNET where your Azure VM with DNS lives. If you are have onprem DNS you may need a Conditional forwarder Go to DNS > Domain > Conditional Forwarders > New DNS Domain should be database.windows.net. DO NOT USE privatelink.database.windows.net And you can check if name resolution works fine nslookup fonsecanet-westeu.database.windows.net Server: fonsecanetad.internal.cloudapp.net Address: 10.0.0.5 Non-authoritative answer: Name: fonsecanet-westeu.privatelink.database.windows.net Address: 10.0.1.4 Aliases: fonsecanet-westeu.database.windows.net If do not want to use forwarder you can also create a forward lookup zone and added manually the host to match the FQDN 6 - Azure Function > VNET integration > Private Endpoint To make Azure Function to connect to private endpoint you will need to use VNET integration https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-vnet https://docs.microsoft.com/en-us/azure/app-service/web-sites-integrate-with-vnet *Status on (2020-04-27) There was some limitation on webapp + private DBS that are now gone and just need to set some configurations as documented at https://docs.microsoft.com/en-us/azure/app-service/web-sites-integrate-with-vnet#azure-dns-private-zones After your app integrates with your VNet, it uses the same DNS server that your VNet is configured with. By default, your app won't work with Azure DNS Private Zones. To work with Azure DNS Private Zones you need to add the following app settings: WEBSITE_DNS_SERVER with value 168.63.129.16 WEBSITE_VNET_ROUTE_ALL with value 1 These settings will send all of your outbound calls from your app into your VNet in addition to enabling your app to use Azure DNS private zones. And I've also created a sample Azure Function app that accept the connection string as parameter and was able to connect fine You can check the source code at https://github.com/FonsecaSergio/AzureFunctionAppTestConnectivity 7 - Failover Groups with Private Link For this scenario I will just reference link from Product Group article https://techcommunity.microsoft.com/t5/azure-sql-database/using-failover-groups-with-private-link-for-azure-sql-database/ba-p/1272902 Be sure to config private zone network link to both networks https://docs.microsoft.com/en-us/azure/private-link/private-endpoint-dns#virtual-network-workloads-without-custom-dns-server This model can be extended to multiple peered virtual networks that are associated to the same private endpoint. This can be done by adding new virtual network links to the private DNS zone for all peered virtual networks. Result for my tests C:\Users\FonsecaSergio>nslookup SERVER1.database.windows.net Server: UnKnown Address: 168.63.129.16 Non-authoritative answer: Name: SERVER1.privatelink.database.windows.net Address: 12.1.1.6 Aliases: SERVER1.database.windows.net C:\Users\FonsecaSergio>nslookup SERVER2-westus.database.windows.net Server: UnKnown Address: 168.63.129.16 Non-authoritative answer: Name: SERVER2-westus.privatelink.database.windows.net Address: 11.0.1.4 Aliases: SERVER2-westus.database.windows.net C:\Users\FonsecaSergio>nslookup FAILOVERGROUP.database.windows.net Server: UnKnown Address: 168.63.129.16 Non-authoritative answer: Name: SERVER1.privatelink.database.windows.net Address: 12.1.1.6 Aliases: FAILOVERGROUP.database.windows.net SERVER1.database.windows.net For DNS on more complex environments, please follow up other scenarios as mentioned at https://docs.microsoft.com/en-us/azure/private-link/private-endpoint-dns REF and LINKS https://docs.microsoft.com/en-us/azure/private-link/private-link-service-overview https://docs.microsoft.com/en-us/azure/private-link/private-endpoint-overview https://docs.microsoft.com/en-us/azure/sql-database/sql-database-private-endpoint-overview https://docs.microsoft.com/en-us/azure/private-link/ https://docs.microsoft.com/en-us/azure/app-service/web-sites-integrate-with-vnet#enable-vnet-integration https://channel9.msdn.com/Shows/Data-Exposed/Private-Link-for-Azure-SQL-Database-Part-1?term=Private%20Link%20for%20Azure%20SQL%20Database&lang-en=true https://channel9.msdn.com/Shows/Data-Exposed/Private-Link-for-Azure-SQL-Database-Part-2?term=Private%20Link%20for%20Azure%20SQL%20Database&lang-en=true87KViews4likes8CommentsTesting connectivity against a Managed Instance using a CNAME
As documented here. We can change the SQL Managed Instance default DNS zone ".database.windows.net" to any of your choice. However, the instance name part of the FQDN is mandatory. Whether you are connecting through a public or private endpoint, the virtual cluster manages the connection and then either proxies or redirects it to the appropriate SQL MI resident within the same cluster based on the connection type configured. The virtual cluster utilizes the hostname part of the FQDN in the connection string to lookup for the MI. If the instance name doesn't match an existing one, the connection attempt will fail. For instance, if the FQDN is sqlmi01.a1b2c3d4.database.windows.net and the CNAME is db01.contoso.com it will not work A workaround is to create a 'user@sqlmi01' login on sqlmi01 and pass it in the connection string using SQL Server authentication. This explicitly tells the virtual cluster to disregard the hostname in the FQDN and look up a managed instance called sqlmi01. Another option is to use the same hostname in the CNAME record, so sqlmi01.contoso.com would work. Although this won't suit multiple SQL MIs because SQL MI names are globally unique. One thing to note is that when using option 2 from the section above, is that all encrypted connections will fail by default. Reason being that the server-side certificate will not match with the FQDN provided in the connection string. There are a few ways around this. The first one is to use set the property trustServerCertificate to true. Although we should mention that this is not a recommended setting as that option is inherently insecure. Recently, there were some new releases and announcements for ODBC and OLEDB drivers, I was curious enough to set a test with a CNAME pointing to a managed instance public endpoint and perform some tests with the new settings of said drivers. Specifically, the hostNameInCertificate setting, that could help us with the problem described in option 2 above. As explained on the articles linked above the new releases of ODBC (18) and OLEDB (19) have a few breaking changes, that could impact the connectivity for your particular scenario. Below are the results of my tests using SSMS (.Net SqlClient), ODBC (17 and 18) and OLEDB (18 and 19). Test # Tool Version Authentication Encrypt Connection Encrypt Value Trust Server Certificate Hostname in Certificate Result 1 SSMS (.Net SqlClient) 18.11.1 SQL FALSE N/A FALSE N/A Success 2 SSMS (.Net SqlClient) 18.11.1 SQL TRUE N/A FALSE N/A Failure 3 SSMS (.Net SqlClient) 18.11.1 SQL TRUE N/A TRUE N/A Success 4 SSMS (.Net SqlClient) 18.11.1 AAD FALSE N/A FALSE N/A Failure 5 SSMS (.Net SqlClient) 18.11.1 AAD TRUE N/A FALSE N/A Failure 6 SSMS (.Net SqlClient) 18.11.1 AAD TRUE N/A TRUE N/A Success 7 ODBC 17.9.1.1 SQL FALSE N/A FALSE N/A Success 8 ODBC 17.9.1.1 SQL TRUE N/A FALSE N/A Failure 9 ODBC 17.9.1.1 SQL TRUE N/A TRUE N/A Success 10 ODBC 17.9.1.1 AAD FALSE N/A FALSE N/A Success 11 ODBC 17.9.1.1 AAD TRUE N/A FALSE N/A Failure 12 ODBC 17.9.1.1 AAD TRUE N/A TRUE N/A Success 13 OLEDB 18.6.3 SQL FALSE N/A FALSE N/A Success 14 OLEDB 18.6.3 SQL TRUE N/A FALSE N/A Failure 15 OLEDB 18.6.3 SQL TRUE N/A TRUE N/A Success 16 OLEDB 18.6.3 AAD FALSE N/A FALSE N/A Failure 17 OLEDB 18.6.3 AAD TRUE N/A FALSE N/A Failure 18 OLEDB 18.6.3 AAD TRUE N/A TRUE N/A Failure 19 ODBC 18.0.1.1 SQL FALSE Optional FALSE Blank Failure 20 ODBC 18.0.1.1 SQL FALSE Optional FALSE Set Success 21 ODBC 18.0.1.1 SQL FALSE Optional TRUE Blank Success 22 ODBC 18.0.1.1 SQL TRUE Mandatory FALSE Blank Failure 23 ODBC 18.0.1.1 SQL TRUE Mandatory FALSE Set Success 24 ODBC 18.0.1.1 SQL TRUE Mandatory TRUE Blank Success 25 ODBC 18.0.1.1 SQL TRUE Strict N/A Blank Failure 26 ODBC 18.0.1.1 SQL TRUE Strict N/A Set Failure 27 ODBC 18.0.1.1 AAD FALSE Optional FALSE Blank Failure 28 ODBC 18.0.1.1 AAD FALSE Optional FALSE Set Success 29 ODBC 18.0.1.1 AAD FALSE Optional TRUE Blank Success 30 ODBC 18.0.1.1 AAD TRUE Mandatory FALSE Blank Failure 31 ODBC 18.0.1.1 AAD TRUE Mandatory FALSE Set Success 32 ODBC 18.0.1.1 AAD TRUE Mandatory TRUE Blank Success 33 ODBC 18.0.1.1 AAD TRUE Strict N/A Blank Failure 34 ODBC 18.0.1.1 AAD TRUE Strict N/A Set Failure 35 OLEDB 19.0.0 SQL FALSE Optional FALSE Blank Failure 36 OLEDB 19.0.0 SQL FALSE Optional FALSE Set Success 37 OLEDB 19.0.0 SQL FALSE Optional TRUE Blank Failure 38 OLEDB 19.0.0 SQL TRUE Mandatory FALSE Blank Failure 39 OLEDB 19.0.0 SQL TRUE Mandatory FALSE Set Success 40 OLEDB 19.0.0 SQL TRUE Mandatory TRUE Blank Failure 41 OLEDB 19.0.0 SQL TRUE Strict N/A Blank Failure 42 OLEDB 19.0.0 SQL TRUE Strict N/A Set Failure 43 OLEDB 19.0.0 AAD FALSE Optional FALSE Blank Failure 44 OLEDB 19.0.0 AAD FALSE Optional FALSE Set Success 45 OLEDB 19.0.0 AAD FALSE Optional TRUE Blank Failure 46 OLEDB 19.0.0 AAD TRUE Mandatory FALSE Blank Failure 47 OLEDB 19.0.0 AAD TRUE Mandatory FALSE Set Success 48 OLEDB 19.0.0 AAD TRUE Mandatory TRUE Blank Failure 49 OLEDB 19.0.0 AAD TRUE Strict N/A Blank Failure 50 OLEDB 19.0.0 AAD TRUE Strict N/A Set Failure As we can see there might be a difference on the result depending on the driver, the settings and the version used. Whenever possible try updating and testing your drivers to ensure the stability of your connections Cheers!8.9KViews0likes4Comments