azure sql
722 TopicsReplication lag metric for Azure SQL DB is now in public preview
Azure SQL Database offers business continuity capabilities to recover quickly from regional disasters. Features such as active geo-replication and failover groups provide continuous replication of the data in your primary database to a secondary database in a different Azure region. In the event of a regional disruption, these features allow you to perform quick disaster recovery to your secondary database to meet your business' recovery time objective (RTO) and recovery point objective (RPO). RTO for Azure SQL Database is typically less than 60 seconds, but RPO depends on the amount of data changes before the disruptive event that have not been replicated. Consequently, monitoring the replication lag between the primary and secondary databases is critical in ensuring your RPO goals are met. Until now, the main way to measure the replication lag between the primary and secondary databases was with the replication_lag_sec column of the dynamic management view (DMV), sys.dm_geo_replication_link_status from your primary database. With the introduction of the Replication lag metric, you can now monitor lag with respect to RPO in near real time in the Azure portal in addition to using the DMV. Replication lag is a new Azure monitor metric that is emitted at a one-minute frequency and stored up to 93 days. You can visualize the metric in Azure monitor and set up alerts too. The replication lag metric measures the time span in seconds from the point of transaction commit on the primary database and acknowledgement by the secondary database that the transaction log update has been persisted. The replication lag metric is applicable for a database in DTU or vCore purchasing model and in all service tiers (Basic, Standard, Premium, General Purpose, Business Critical & Hyperscale). Both singleton and elastic pool deployments are supported. You can monitor the metric by adding Replication lag (preview) from your primary database in the portal as shown below: The metric provides three dimensions, Partner Database Name, Partner Server Name, and Partner Resource ID that you can use to further filter or split the data to view specific replication links. If your database is configured to send Metrics to Log Analytics under “Diagnostic settings”, you can also query the Replication lag metric data as shown below: Next Steps Learn more about monitoring geo-replication and other commonly used metrics in Azure SQL Database. See how you can achieve your business continuity goals with Azure SQL Database using Active geo-replication and Failover groups. Prepare for disasters with the Disaster recovery checklist Frequently Asked Questions What is the Replication lag and what does it measure? The Replication lag is a new metric in Azure monitor that measures the time span in seconds between the transactions committed on the primary and hardened to the transaction log on the secondary. How do you view the metric in the portal? In the Azure portal, select your primary SQL database and under "Monitoring", select "Metrics". In the "Metrics" dropdown, choose "Replication lag (preview)". What is the granularity of the Replication lag metric? One minute. What is the latency in displaying the Replication lag data in the Metrics screen? Typically, the latency to display the replication lag is less than three minutes. Are there dimensions available for the Replication lag metric? Yes, there are three dimensions available for the metric - Partner Database Name, Partner Server Name, and Partner Resource ID. These dimensions can be used for filtering and splitting the view in the Metrics screen for easier comparison of multiple secondary geo-replicas.1.2KViews2likes6CommentsRegex support for LOB types in T-SQL—available in Azure SQL & SQL Server 2025
At a glance — Native regular expression (regex) functions in T-SQL now accept varchar(max) and nvarchar(max) inputs of up to 2 MB across all seven regex functions, including the two table-valued functions (REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE). This capability ships in SQL Server 2025 CU5 and is already available in Azure SQL Database, SQL Database in Fabric and Azure SQL Managed Instance configured with the Always-up-to-date update policy. It will reach Managed Instances on the SQL Server 2025 update policy as part of the CU5 rollout. You no longer need to split log files, HTML documents, or large JSON payloads into 8,000-byte chunks just to run a pattern match. 1. Introduction Regular expressions have long been a cornerstone of modern data processing — used for validation, parsing, transformation, and extracting structured insights from unstructured text. With SQL Server 2025 and Azure SQL, regex is now a first-class T-SQL capability, removing the historical need to rely on SQLCLR functions or application-tier processing. While the initial release made native regex broadly available, large-object (LOB) inputs were not yet supported on every function. CU5 closes that gap. Under the hood, T-SQL regex implements POSIX Extended Regular Expression (ERE) semantics, augmented by a curated set of Perl-style features, and is powered by the RE2 engine. RE2 is a linear-time, non-backtracking implementation, which means it is not susceptible to catastrophic backtracking (a class of denial-of-service issue commonly known as ReDoS). That guarantee becomes far more important when the input is a 1.8 MB log blob than when it is an 8,000-byte string. Release timeline Milestone What shipped Ignite 2025 — General Availability Regex went GA in SQL Server 2025 and Azure SQL. LOB inputs were initially supported only on REGEXP_LIKE, REGEXP_COUNT, and REGEXP_INSTR. LOB support on REGEXP_REPLACE and REGEXP_SUBSTR was deferred, and the two table-valued functions (TVFs) accepted only non-LOB string types. Azure SQL (post-GA service updates) LOB inputs enabled across all seven functions. SQL Server 2025 CU5 LOB inputs up to 2 MB enabled on all seven functions in the SQL Server. What’s new in CU5 varchar(max) and nvarchar(max) inputs are accepted on every regex function. The input string is capped at 2 MB per function call. The pattern is still capped at 8,000 bytes, which is far larger than any maintainable regular expression should ever need. Behavior is consistent between Azure SQL and SQL Server, so code you write today is fully portable. Note — The 2 MB limit applies to the input passed to a single function call, not to the column or row. A single value in a varchar(max) column can still store up to 2 GB; the constraint is that no single regex evaluation can consume more than 2 MB of that value. Prerequisites SQL Server 2025 CU5 or later, or Azure SQL Database, or SQL Database in Fabric or Azure SQL Managed Instance configured with the SQL Server 2025 / Always-up-to-date update policy. The two table-valued functions (REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE) require database compatibility level 170, unless the database-scoped configuration ALLOW_BUILTIN_TVF_IN_ALL_COMPAT_LEVELS (preview) is enabled. Note — On Azure SQL Managed Instance (Always-up-to-date), this capability is rolling out region by region. It is already live in regions where the rollout has completed and will light up in the remaining regions as the deployment finishes. Instances on the SQL Server 2025 update policy will receive it as part of the CU5 rollout — coming soon. Verify compatibility level (170 required for the TVFs) – SELECT name, compatibility_level FROM sys.databases WHERE name = DB_NAME(); -- If necessary: -- ALTER DATABASE [<your-database>] SET COMPATIBILITY_LEVEL = 170; 2. Working with LOB Data This section demonstrates the CU5 capabilities against a realistic LOB data. We build a LogEntries table whose RawPayload column holds multi-KB to multi-MB chunks of web server and application output, plus an HtmlPages table for HTML cleansing examples. 2.1 Create the sample schema and data IF OBJECT_ID('dbo.LogEntries', 'U') IS NOT NULL DROP TABLE dbo.LogEntries; IF OBJECT_ID('dbo.HtmlPages', 'U') IS NOT NULL DROP TABLE dbo.HtmlPages; CREATE TABLE dbo.LogEntries ( LogId BIGINT IDENTITY(1,1) PRIMARY KEY, Source SYSNAME NOT NULL, IngestedAt DATETIME2(3) NOT NULL DEFAULT SYSUTCDATETIME(), RawPayload VARCHAR(MAX) NOT NULL -- LOB column ); CREATE TABLE dbo.HtmlPages ( PageId INT IDENTITY(1,1) PRIMARY KEY, Url NVARCHAR(2048) NOT NULL, Body NVARCHAR(MAX) NOT NULL -- LOB column (Unicode) ); Now generate realistically large rows. The REPLICATE(CAST(... AS varchar(max)), n) pattern is required because REPLICATE returns NULL when the result would exceed 8,000 bytes unless its first argument is a max type. -- Synthetic web access-log payload (~252 KB in row 1, plus a separate ~586 KB row). DECLARE @logLine VARCHAR(500) = '127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200 1532 ' + 'user-agent="Mozilla/5.0" ip=10.0.0.7 email=alice@contoso.com card=4111-1111-1111-1234' + CHAR(10); DECLARE @bigLog VARCHAR(MAX) = REPLICATE(CAST(@logLine AS VARCHAR(MAX)), 1500) -- ~252 KB + '127.0.0.1 - mallory [21/May/2026:10:16:01 +0000] "POST /login HTTP/1.1" 500 0 ' + 'ip=203.0.113.99 ssn=123-45-6789' + CHAR(10); INSERT INTO dbo.LogEntries (Source, RawPayload) VALUES ('web-01', @bigLog), -- ~252 KB ('web-02', REPLICATE(CAST('OK ' AS VARCHAR(MAX)), 200000)); -- ~586 KB -- Synthetic HTML page (~775 KB / ~396,000 characters). DECLARE @htmlChunk NVARCHAR(MAX) = N'<div class="row"><p>Hello <b>world</b>! Contact <a href="mailto:bob@contoso.com">bob</a>.</p></div>'; INSERT INTO dbo.HtmlPages (Url, Body) VALUES (N'https://contoso.example/page-1', N'<html><head><title>Big Page</title></head><body>' + REPLICATE(@htmlChunk, 4000) + N'</body></html>'); -- Confirm payload sizes in bytes. SELECT LogId, Source, DATALENGTH(RawPayload) AS PayloadBytes FROM dbo.LogEntries; SELECT PageId, DATALENGTH(Body) AS BodyBytes, LEN(Body) AS BodyChars FROM dbo.HtmlPages; Results: LogId Source PayloadBytes 1 web-01 258,110 2 web-02 600,000 PageId BodyBytes BodyChars 1 792,124 396,062 Before CU5, feeding any of these payloads into REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_MATCHES, or REGEXP_SPLIT_TO_TABLE would have failed with a type-mismatch error or required a LEFT(RawPayload, 8000)-style truncation. The same queries now run end-to-end. 2.2 REGEXP_LIKE — Filter rows by LOB content -- Find logs that contain at least one HTTP 5xx response. SELECT LogId, Source, DATALENGTH(RawPayload) AS PayloadBytes FROM dbo.LogEntries WHERE REGEXP_LIKE(RawPayload, '"[A-Z]+\s[^"]+\sHTTP/1\.[01]"\s5[0-9]{2}\s'); REGEXP_LIKE is a Boolean predicate: it evaluates to true when the pattern matches anywhere in the input and false otherwise. Because it returns a Boolean rather than a bit, use it directly in WHERE, CASE WHEN, IIF, or CHECK constraint contexts — do not compare it with = 1 or = 0 (the parser rejects that syntax). Note — REGEXP_LIKE itself requires database compatibility level 170. The other scalar regex functions (REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR) are available at all compatibility levels. Results: LogId Source PayloadBytes 1 web-01 258,110 2.3 REGEXP_COUNT — Counting at scale -- Per-row tally of GET requests, POST requests, and 5xx responses -- across the entire LOB payload. SELECT LogId, Source, REGEXP_COUNT(RawPayload, '"GET\s') AS Gets, REGEXP_COUNT(RawPayload, '"POST\s') AS Posts, REGEXP_COUNT(RawPayload, '\s5[0-9]{2}\s') AS ServerErrors FROM dbo.LogEntries; Results: LogId Source Gets Posts ServerErrors 1 web-01 1,500 1 1 2 web-02 0 0 0 2.4 REGEXP_INSTR — Locate the first error -- 1-based character position (or 0 if no match) of the FIRST 5xx response in each payload. SELECT LogId, Source, REGEXP_INSTR(RawPayload, '\s5[0-9]{2}\s', 1, 1, 0) AS FirstErrorPos FROM dbo.LogEntries; Parameter recap: REGEXP_INSTR(string, pattern, start, occurrence, return_option [, flags [, group ]]). A return_option of 0 returns the starting position of the match; 1 returns the position immediately after the last character of the match. Results: LogId Source FirstErrorPos 1 web-01 258,072 2 web-02 0 2.5 REGEXP_REPLACE — Redact sensitive data in place PII redaction over LOB payloads was one of the most-requested CU5 scenarios. Before CU5, it required a custom chunked-replace routine; it is now a single expression. -- Redact credit-card-shaped tokens, U.S. SSN-shaped tokens, and email addresses -- across the entire payload. SELECT LogId, REGEXP_REPLACE( REGEXP_REPLACE( REGEXP_REPLACE( RawPayload, '\b[0-9]{4}[- ]?[0-9]{4}[- ]?[0-9]{4}[- ]?[0-9]{4}\b', '****-****-****-****'), '\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b', '***-**-****'), '\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b', '[redacted-email]' ) AS RedactedPayload FROM dbo.LogEntries; Or strip every HTML tag from an nvarchar(max) page in a single call: SELECT PageId, LEN(Body) AS OriginalLen, LEN(REGEXP_REPLACE(Body, N'<[^>]+>', N'')) AS TextOnlyLen FROM dbo.HtmlPages; Results — the ~775 KB HTML document collapses from 396,062 to 100,008 characters of plain text in a single call: PageId OriginalLen TextOnlyLen 1 396,062 100,008 2.6 REGEXP_SUBSTR — Extract a single value -- Pull the first IPv4 address out of each log payload. SELECT LogId, REGEXP_SUBSTR(RawPayload, '\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b', 1, -- start position 1, -- occurrence 'c', -- flags: case-sensitive 0 -- group: 0 returns the whole match ) AS FirstIp FROM dbo.LogEntries; To return the contents of a specific capture group instead of the entire match, pass its 1-based group number as the final argument. Results: LogId FirstIp 1 127.0.0.1 2 NULL 2.7 REGEXP_MATCHES — Every match, set-based This is where the combination of TVF and LOB delivers the largest productivity gain: extract every structured value from a megabyte of unstructured text in a single set-based query, with no client round-trips. REGEXP_MATCHES returns one row per match with these columns: Column Type Description match_id bigint Sequence number of the match (1-based). start_position int 1-based start index of the match. end_position int 1-based end index of the match. match_value same type as string_expression The entire matched substring. substring_matches json JSON array describing each capture group, with the shape [{"value":"…","start":N,"length":N}, …]. -- Every email address in every log payload, alongside its row of origin. SELECT l.LogId, m.match_id, m.match_value AS EmailFound FROM dbo.LogEntries AS l CROSS APPLY REGEXP_MATCHES( l.RawPayload, '\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b' ) AS m ORDER BY l.LogId, m.match_id; Capture groups are even more useful — you can project the parts of every log line as columns by reading from the substring_matches JSON document: -- Parse Common-Log-Format-ish entries into ip, user, status, and bytes columns. -- The pattern has four capture groups, accessed below as $[0] through $[3]. SELECT l.LogId, m.match_id, JSON_VALUE(m.substring_matches, '$[0].value') AS Ip, JSON_VALUE(m.substring_matches, '$[1].value') AS UserName, JSON_VALUE(m.substring_matches, '$[2].value') AS Status, JSON_VALUE(m.substring_matches, '$[3].value') AS Bytes FROM dbo.LogEntries AS l CROSS APPLY REGEXP_MATCHES( l.RawPayload, '^([0-9.]+)\s-\s(\S+)\s\[[^\]]+\]\s"[^"]+"\s([0-9]{3})\s([0-9]+)', 'm' -- multi-line: ^ and $ anchor to each line, not just the whole input ) AS m ORDER BY l.LogId, m.match_id; Important — Without the 'm' flag, the ^ anchor matches only at the start of the entire 250 KB input, so you would receive exactly one match for the first line. The multi-line flag is what unlocks per-line extraction. Results (first two parsed rows): LogId match_id Ip UserName Status Bytes 1 1 127.0.0.1 alice 200 1532 1 2 127.0.0.1 alice 200 1532 2.8 REGEXP_SPLIT_TO_TABLE — Shred a LOB into rows -- Project the entire log payload as one row per non-empty line. SELECT l.LogId, s.ordinal AS [LineNo], s.value AS LineText FROM dbo.LogEntries AS l CROSS APPLY REGEXP_SPLIT_TO_TABLE(l.RawPayload, '\r?\n') AS s WHERE l.LogId = 1 AND s.value <> '' ORDER BY s.ordinal; You now have a tabular projection of a multi-megabyte text blob without leaving the engine. You can feed it into a CTE, aggregate it, join it to dimension tables, or materialize it into a staging table — all set-based. Results (first three rows): LogId ordinal LineText (first 80 chars) 1 1 127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200 1 2 127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200 1 3 127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200 Tip — composing LOB regex pipelines — CROSS APPLY (and OUTER APPLY when you need to preserve rows that produce no matches) is the primary composition primitive. You can stack REGEXP_SPLIT_TO_TABLE (lines) feeding REGEXP_MATCHES (fields per line) feeding ordinary aggregates, all within a single query plan. 2.9 The 2 MB ceiling — strategies for larger inputs The 2 MB limit applies to the input string of a single regex call. If the value passed to a regex function exceeds 2 MB, the call raises an error (error number 19311, severity 16) rather than silently truncating. That is the intended behavior — silent truncation would hide correctness bugs. In practice, 2 MB is a generous ceiling: a single log file or HTML document of that size is already unusual, and most real-world LOB data sit comfortably below it. When individual values do exceed the limit, the most reliable approach is to split them into smaller logical units before they land in the column you want to query — for example, by writing one log line, one document section, or one record per row at ingestion time. Because every regex function (including the two TVFs) shares the same 2 MB ceiling, sharding at query time is not generally feasible; doing it at the load path keeps every regex call well under the limit and avoids per-query workarounds. Bytes vs. characters — The 2 MB limit is measured in bytes, not characters, and the byte count is based on the UTF-8 encoding of the input regardless of the column’s declared type. ASCII characters take 1 byte each, so plain ASCII text can run to roughly two million characters; non-ASCII characters take 2–4 bytes in UTF-8, so fewer characters fit. Keep in mind that DATALENGTH() reports storage size in the column’s own encoding, which may differ from the UTF-8 byte count used by the limit, and LEN() (which counts characters) is best avoided as a sizing check here. To measure the UTF-8 byte length that the limit actually checks, cast the value to varchar(max) under a UTF-8 collation and take its DATALENGTH: SELECT DATALENGTH( CONVERT(varchar(max), Body COLLATE Latin1_General_100_CI_AS_SC_UTF8) ) AS Utf8Bytes FROM dbo.HtmlPages; Anything above 2 * 1024 * 1024 (2,097,152) bytes will be rejected by a regex call on that value. Have a scenario that genuinely needs more than 2 MB? If your workload requires regex evaluation on individual values larger than the current 2 MB ceiling, we would like to hear about it. Please share the details — data shape, payload size, pattern, and business need — on the Azure SQL feedback portal. Customer feedback directly informs how we prioritize future limit changes. 2.10 Cleanup DROP TABLE IF EXISTS dbo.LogEntries; DROP TABLE IF EXISTS dbo.HtmlPages; 3. Summary What changed in CU5 Before CU5 — LOB inputs were accepted on REGEXP_LIKE, REGEXP_COUNT, and REGEXP_INSTR. The remaining functions — REGEXP_REPLACE, REGEXP_SUBSTR, and the two TVFs (REGEXP_MATCHES, REGEXP_SPLIT_TO_TABLE) — required non-LOB string inputs, which often meant truncating with LEFT(..., 8000) or chunking in the application tier. After CU5 (and already in Azure SQL) — All seven functions accept varchar(max) and nvarchar(max) inputs of up to 2 MB. The pattern remains capped at 8,000 bytes. Quick reference Function Returns LOB input (CU5) Common use case REGEXP_LIKE Boolean (predicate) Yes Filter rows in WHERE / CASE / CHECK predicates REGEXP_COUNT int Yes Count occurrences of a pattern REGEXP_INSTR int Yes Position of the nth match REGEXP_REPLACE string Yes Redact, cleanse, or normalize text REGEXP_SUBSTR string Yes Extract a single value REGEXP_MATCHES (TVF) (match_id, start_position, end_position, match_value, substring_matches) Yes Extract every match plus capture groups (via JSON), set-based REGEXP_SPLIT_TO_TABLE (TVF) (value, ordinal) Yes Split a LOB into rows by a regex delimiter Further reading Official documentation: REGEXP_LIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_MATCHES, REGEXP_SPLIT_TO_TABLE. Regular expressions overview. SQL Server 2025 CU5 release notes. Closing thought. Native regex was already a significant quality-of-life improvement when it became generally available. CU5 completes the picture: every function, every input size up to 2 MB, every shape — scalar or table-valued. The next time you are tempted to export a column out of the database in order to grep it, try one of the seven regex functions first. Happy matching. 🧠73Views0likes0CommentsPublic Preview - Data Virtualization for Azure SQL Database
Data virtualization, now in public preview in Azure SQL Database, enables you to leverage all the power of Transact-SQL (T-SQL) and seamlessly query external data from Azure Data Lake Storage Gen2 or Azure Blob Storage, eliminating the need for data duplication, or ETL processes, allowing for faster analysis and insights. Integrate external data, such as CSV, Parquet, or Delta files, with your relational database while maintaining the original data format and avoiding unnecessary data movement. Present integrated data to applications and reports as a standard SQL object or through a normal SELECT command. Data Virtualization for Azure SQL Database supports SAS tokens, Managed Identity, and User identity for secure access. Data Virtualization for Azure SQL Database will introduce and expand support for: Database Scoped Credential. External Data Source. External File Format - with support for Parquet, CSV, and Delta. External Tables. OPENROWSET. Support metadata functions and JSON functions. For enhanced security and flexibility Data Virtualization for Azure SQL Database supports three authentication methods: Shared access signature. Managed identity (system assigned managed identity and user-assigned managed identity). User identity. Key Benefits Just like in SQL Server 2022 and Azure SQL Managed Instance the key benefits of Data Virtualization for Azure SQL Database are: Seamless Data Access: Query external CSV, Parquet, and Delta Lake tables using T-SQL as if they were native tables within Azure SQL Database. Allowing for off-loading cold data while keeping it easily accessible. Enhanced Productivity: Reduce the time and effort required to integrate and analyze data from multiple sources. Cost Efficiency: Minimize the need for data replication and storage costs associated with traditional data integration methods. Real-Time Insights: Enable real-time data querying and insights without delays caused by data movement or synchronization. Security: Leverage SQL Server security features for granular permissions, credential management, and control. Example Data Virtualization for Azure SQL Database is based on the same core principles as SQL Server’s PolyBase feature. With support for Azure Data Lake Gen 2, using prefix adls:// and Azure Blob Storage, using prefix abs://. For the following example we are going to use Azure Open Datasets, more specifically NYC yellow taxi trip records open data set which allows public access. For private data sources customers can leverage multiple authentication methods like SAS Tokens, Managed Identity and User Identity. -- Create Azure Blob Storage (ABS) data source CREATE EXTERNAL DATA SOURCE NYCTaxiExternalDataSource WITH ( LOCATION = 'abs://nyctlc@azureopendatastorage.blob.core.windows.net'); -- Using OPENROWSET to read Parquet files from the external data source SELECT TOP 10 * FROM OPENROWSET( BULK '/yellow/puYear=*/puMonth=*/*.parquet', DATA_SOURCE = 'NYCTaxiExternalDataSource', FORMAT = 'parquet' ) AS filerows; -- Or using External Tables CREATE EXTERNAL FILE FORMAT DemoFileFormat WITH (FORMAT_TYPE = PARQUET); --Create external table CREATE EXTERNAL TABLE tbl_TaxiRides ( vendorID VARCHAR(100) COLLATE Latin1_General_BIN2, tpepPickupDateTime DATETIME2, tpepDropoffDateTime DATETIME2, passengerCount INT, tripDistance FLOAT, puLocationId VARCHAR(8000), doLocationId VARCHAR(8000), startLon FLOAT, startLat FLOAT, endLon FLOAT, endLat FLOAT, rateCodeId SMALLINT, storeAndFwdFlag VARCHAR(8000), paymentType VARCHAR(8000), fareAmount FLOAT, extra FLOAT, mtaTax FLOAT, improvementSurcharge VARCHAR(8000), tipAmount FLOAT, tollsAmount FLOAT, totalAmount FLOAT ) WITH ( LOCATION = 'yellow/puYear=*/puMonth=*/*.parquet', DATA_SOURCE = NYCTaxiExternalDataSource, FILE_FORMAT = DemoFileFormat ); SELECT TOP 10 * FROM tbl_TaxiRides; You can also use these capabilities in combination with other metadata functions like sp_describe_first_result_set, filename(), and filepath(). Getting started Data Virtualization for Azure SQL Database is currently available in select regions, with broader availability coming soon across all Azure regions Data Virtualization for Azure SQL Database is based on the same core principles as SQL Server’s PolyBase feature. To know more and get started with Data Virtualization for Azure SQL Database.2.2KViews3likes1CommentAutomatic Connectivity Tests for Azure SQL Managed Instance
To further enhance connectivity monitoring and improve service reliability, we’re introducing automatic internal connectivity tests for all Azure SQL Managed Instances. These tests are fully automated and require no action from you. Beginning May 2026, the tests will be continuously performed at regular intervals on all managed instances. By proactively monitoring internal network connections, we’re able to quickly identify potential issues and maintain stable end-to-end connectivity. These tests are performed from a pair of internal IP addresses from the subnet range that hosts the managed instance, so they do not require any external inbound or outbound connectivity. Please note that additional IP addresses will be reserved for these tests and that tests may leave traces in your observability logs. Automatic tests diagnose issues in internal service and network availability. This results in accelerated issue discovery and shorter time to mitigate incidents that involve degraded connectivity of managed instances’ internal networking components. This suite of connectivity tests examines internal network connections at several levels, boosting the supportability and visibility into the service’s internal state and offering you peace of mind regarding your managed instances. Do note that your audit and security systems, if configured to track certain types of events emitted by SQL Server, may record failed login attempts. Those are normal and expected byproducts of the end-to-end connectivity test suite. If you would prefer to not have those events register in your SQL Server audit logs, SQL error logs, or captured Extended Events, we provide you with their event signatures so you can set up event filters or configure your SIEM system to ignore them: Observing failed logins caused by end-to-end tests. You can read more about the automated connectivity tests at Automatic internal connectivity tests for Azure SQL Managed Instance.222Views0likes0CommentsUnlocking More Power with Flexible Memory in Azure SQL Managed Instance
Service update - May 6th 2026. Next-gen General Purpose: locally redundant instances. Flexible memory for the Next-gen General purpose tier is generally available (GA). Business Critical: locally redundant and zone-redundant instances. Flexible memory is currently in preview for the Business Critical service tier. As data workloads grow in complexity and scale, so does the need for more adaptable and performant database infrastructure. That’s why we’re excited to introduce a new capability in Azure SQL Managed Instance: Flexible Memory, now generally available. What Is Flexible Memory? Flexible Memory allows you to customize the memory-to-vCore ratio in your SQL Managed Instance, enabling finer control over both performance and cost based on your workload requirements. This capability is part of the next-generation General Purpose and Business Critical tiers. It introduces a memory slider, which enables you to scale memory independently within supported limits - without changing the number of vCores. The memory slider is currently available only on premium-series hardware. Why It Matters Traditionally, memory allocation in SQL Managed Instance was fixed per vCore. With Flexible Memory, you can now: Increase memory beyond the default allocation Optimize for memory-intensive workloads without overprovisioning compute Pay only for what you use - additional memory is billed per GB/hour This flexibility is especially valuable for scenarios like analytics, caching, or workloads with large buffer pool requirements. How It Works Memory scales based on the number of vCores and the selected hardware tier: Hardware Tier Memory per vCore (GB) Standard-series 5.1 Premium series 7–12 Premium series (memory-optimized) Up to 13.6 You can select from predefined memory ratios (e.g., 7, 8, 10, 12 GB per vCore) depending on your configuration. For example, a 10 vCore instance can be configured with 70 GB to 120 GB of memory. One of the most powerful aspects of the Flexible Memory feature is the ability to select from a range of memory-to-vCore ratios. These “click stops” allow you to tailor memory allocation precisely to your workload’s needs - whether you’re optimizing for performance, cost, or both. The table below outlines the available configurations for Premium Series hardware, showing how memory scales across 16 vCore sizes: vCores Available Ratios Total Memory Options (GB) 4 7, 8, 10, 12 28, 32, 40, 48 6 7, 8, 10, 12 42, 48, 60, 72 8 7, 8, 10, 12 56, 64, 80, 96 10 7, 8, 10, 12 70, 80, 100, 120 12 7, 8, 10, 12 84, 96, 120, 144 16 7, 8, 10, 12 112, 128, 160, 192 20 7, 8, 10, 12 140, 160, 200, 240 24 7, 8, 10, 12 168, 192, 240, 288 32 7, 8, 10, 12 224, 256, 320, 384 40 7, 8, 10, 12 280, 320, 400, 480 48 7, 8, 10 336, 384, 480 56 7, 8 392, 448 64 7 448 80 7 560 96 5.83 560 128 4.38 560 Pricing model Flexible Memory introduces a usage-based pricing model that ensures you only pay for the memory you actually consume beyond the default allocation. This model is designed to give you the flexibility to scale memory without overcommitting on compute resources - and without paying for unused capacity. How it works: Default memory is calculated based on the minimum memory-to-vCore ratio Billable memory is the difference between your configured memory and the default allocation. Billing is per GB/hour, so you’re charged only for the additional memory used over time. Let’s take an example of SQL Managed Instance running on premium series hardware with 4 vCores and 40GB of memory. Configuration Value vCores 4 Configured Memory 40 GB Default Memory (4 × 7 GB) 28 GB Billable Memory 12 GB Billing Unit Per GB/hour Charged For 12 GB of additional memory Management Experience Changing memory behaves just like changing vCores: Seamless updates via Azure Portal, PowerShell, SDK or API Failover group guidance remains the same Upgrade secondary first Configurations between primary and secondary should match Adjusting the memory is fully online operation, with a short failover at the very end of it. The operation will go through the process of allocating the new compute with specified configuration, which takes approximately 60 minutes, with new faster management operations. API Support Flexible Memory is fully supported via API (the minimal API version that can be used is 2024-08-01) and Azure Portal. Here’s a sample API snippet to configure memory: { "properties": { "memorySizeInGB": 96 } } Portal support Note: Portal experience is in the rollout for Business Critical service tier! Use API, PowerShell or Azure CLI for configuring additional memory for your Business Critical instance. Summary The new Flexible Memory capability in Azure SQL Managed Instance empowers you to scale memory independently of compute, offering greater control over performance and cost. With customizable memory-to-vCore ratios, a transparent pricing model, and seamless integration into existing management workflows, this feature is ideal for memory-intensive workloads and dynamic scaling scenarios. Whether you're optimizing for analytics, caching, or simply want more headroom without overprovisioning vCores, Flexible Memory gives you the tools to do it - efficiently and affordably. Next Steps Review the Documentation: Explore detailed configuration options, supported tiers, and API usage. Additional memory Management operations overview Management operations duration Test Your Workloads: Use the memory slider in the Azure Portal, PowerShell, SDK or API to experiment with different configurations. Learn more What is Azure SQL Managed Instance Try Azure SQL Managed Instance for free Next-gen General Purpose – official documentation Analyzing the Economic Benefits of Microsoft Azure SQL Managed Instance How 3 customers are driving change with migration to Azure SQL Accelerate SQL Server Migration to Azure with Azure Arc1.6KViews3likes0CommentsAzure SQL is Retiring the “No Minimum TLS” (MinTLS None) Configuration
As part of the retirement of lower TLS versions 1.0 and 1.1 and the enforcement of 1.2 as the new default minimum TLS version, we will be removing the No Minimum TLS (MinTLS = “None” or "0") option and updating these configurations to TLS 1.2. No Minimum TLS allowed Azure SQL Database and Azure SQL Managed Instance resources to accept client connections using any TLS protocol version and unencrypted connections. Over the past year, Azure has retired TLS 1.0 and 1.1 for all Azure databases, due to known security vulnerabilities in these older protocols. As of August 31, 2025, creating servers configured with versions 1.0 and 1.1 was disallowed and migration to 1.2 began. With legacy TLS versions being retired, TLS 1.2 will become the secure default minimum TLS version for new Azure SQL DB and MI configurations and for all client-server connections, rendering the MinTLS = None setting obsolete. As a result, the MinTLS = None configuration option will be retired for new servers, and existing servers configured with No Minimum TLS will be upgraded to 1.2. What is changing? After July 31, 2026, we will disallow minimum TLS value "None", for the creation of new SQL DB and MI resources using PowerShell, Azure CLI, and any other REST based interface. This configuration option has already been removed from the Portal as part of the retirement of TLS versions 1.0 and 1.1. Creating new Azure SQL Database and Managed Instance servers with MinTLS = None (which was previously considered the default) will no longer be a supported configuration. If the server parameter value for the minimum TLS is left blank, it will default to minimum TLS version 1.2. Attempts to create an Azure SQL server with MinTLS = None will fail with an “Invalid operation” error and downgrades to None will be disallowed. While attempts to connect with TLS 1.0, 1.1 or unencrypted connections will fail with “Error: 47072/171 on Gateway.” Effective date (retirement milestone) MinTLS = None (0) MinTLS left blank (defaults to supported minimum) Before 8/31/25 Any + Unencrypted Any + Unencrypted After 8/31/25 1.2 + Unencrypted 1.2 After July 31, 2026 Invalid operation error (for new server creates) Downgrades will be disallowed TLS error: 47072/171 (for unencrypted connections) 1.2 In summary, after July 31, 2026, Azure SQL Database and Azure SQL Managed Instance will require all client connections to use TLS 1.2 or higher and unencrypted connections will be denied. The minimum TLS version setting will no longer accept the value "None" for new or existing servers and servers currently configured with this value will be upgraded to explicitly enforce TLS 1.2. Who is impacted? For most Azure SQL customers, there is no action required. Most clients already use TLS 1.2 or higher. After July 31, 2026, if your Azure SQL Database or Managed Instance is still configured with No Minimum TLS and using 1.0, 1.1 or unencrypted connections, it will automatically update to TLS 1.2 to reflect the current minimum protocol enforcement in client-server connectivity. We do recommend you verify your client applications – especially any older or third-party client drivers – to ensure they can communicate with TLS 1.2 or above. In some rare cases, very old applications, such as an outdated JDBC driver or older .NET framework version, may need an update or need to enable TLS 1.2. Conclusion This retirement is part of Azure’s broader security strategy to ensure encrypted connections are secure by modern encryption standards. TLS version 1.2 is more secure than older versions and is now the industry standard (required by regulations like PCI DSS and HIPAA). This change eliminates the use of unencrypted connections which ensure all database connections meet current security standards. If you’ve already migrated to TLS 1.2 (as most customers have), you will most likely not notice any change, except that the No Minimum TLS option will disappear from configurations.858Views0likes0CommentsDatabase DevOps (preview) in SSMS 22.4.1
Database DevOps tooling for Microsoft SQL brings the benefits of database-as-code to your development workflow. At its core are SQL database projects, which enable you to source control your database schema, perform reliable deployments to any environment, and integrate code quality checks into your development process.7.7KViews1like4CommentsDynamic Data Masking – What it is, What it isn’t, and How to use it effectively
In this post, we’ll explain the core purpose of Dynamic Data Masking (to ease application development), how it works, and its proper use cases – as well as its limitations. If you’re considering using Dynamic Data Masking or reviewing your data security strategy, this information will help you make informed decisions. What Dynamic Data Masking is designed for Dynamic Data Masking Dynamic Data Masking - SQL Server | Microsoft Learn is a database feature that can be used to alter how certain data elements are presented in query results for users who do not have privileged access or required permission. For example, a query on an email column may return a masked value such as jXXX@XXXX.com rather than the full address, depending on user permissions, while the original data remains unchanged in storage. Masking rules are defined within the database schema and are applied to query results for applicable users at runtime. This approach can simplify application developer’s job and reduce the need for application‑level logic that modifies how sensitive values are displayed across different application(s) or reports. DDM can help prevent accidental or casual exposure of sensitive information. How Does DDM differ from other security features? Dynamic Data Masking affects only what users see in query results—it does not protect the underlying data. Unlike encryption Always Encrypted - SQL Server | Microsoft Learn or Row‑Level security Row-Level Security - SQL Server | Microsoft Learn, DDM does not encrypt data, filter rows, or override SQL permissions. Users with elevated privileges (such as UNMASK, db_owner, or sysadmin) always see unmasked data or can modify or remove masking rules. What DDM doesn’t protect against Because Dynamic Data Masking is applied when query results are returned, there are several considerations to be aware of: Inference through queries: In some scenarios, users with database access may be able to make inferences about masked values by applying query filters or conditions that rely on underlying stored data. The database is still comparing the real values under the hood, so these queries work. It’s an expected behavior given DDM’s design. Privileged users: Users who are granted sufficient database permissions, such as the ability to alter table schemas, can directly disable or remove masking. Users with sysadmin, db_owner or CONTROL permission can view unmasked data. Thus, controlling and auditing who holds such privileges is vital. Metadata visibility: Masking rules and associated columns can be discoverable through system metadata. Data movement: Because masking is defined at the schema level in a given database instance, backups or exported datasets may contain unmasked values depending on permissions and configuration. Understanding these design characteristics is important when incorporating DDM into a broader data governance or privacy strategy. Proper use and best practices for DDM Organizations may consider using Dynamic Data Masking in scenarios where consistent display of sensitive values is needed across application(s) or reporting environments. Some implementation considerations include: Using DDM to help standardize how sensitive fields are displayed in query results and reduce developmental effort for data masking Combining DDM with other database or access‑control features as part of a layered data protection strategy Reviewing which users are granted permissions to view unmask data or alter masking configurations. Implementing auditing or monitoring database activity as part of broader governance practices Educating internal stakeholders on how masking operates at the query‑result level Testing masking configurations in non‑production environments prior to deployment Conclusion Dynamic Data Masking can be useful in scenarios where organizations want to manage how sensitive data is displayed in application outputs without modifying stored values. It is designed to operate as part of a broader data access or governance approach rather than as a standalone protection mechanism for stored data. When implemented alongside complementary database features and appropriate access controls, DDM may help support more consistent handling of sensitive values across environments.270Views0likes0CommentsStream data in near real time from SQL to Azure Event Hubs - Public preview
If near-real time integration is something you are looking to implement and you were looking for a simpler way to get the data out of SQL, keep reading. SQL is making it easier to integrate and Change Event Streaming is a feature continuing this trend. Modern applications and analytics platforms increasingly rely on event-driven architectures and real-time data pipelines. As the businesses speed up, real time decisioning is becoming especially important. Traditionally, capturing changes from a relational database requires complex ETL jobs, periodic polling, or third-party tools. These approaches often consume significant cycles of the data source, introduce operational overhead, and pose challenges with scalability, especially if you need one data source to feed into multiple destinations. In this context, we are happy to release Change Event Streaming ("CES") feature into Public Preview for Azure SQL Database. This feature enables you to stream row-level changes - inserts, updates, and deletes - from your database directly to Azure Event Hubs in near real time. Change Event Streaming addresses the above challenges by: Reducing latency: Changes are streamed (pushed by SQL) as they happen. This is in contrast with traditional CDC (change data capture) or CT (change tracking) based approaches, where an external component needs to poll SQL at regular intervals. Traditional approaches allow you to increase polling frequency, but it gets difficult to find a sweet spot between minimal latency and minimal overhead due to too frequent polls. Simplifying architecture: No need for Change Data Capture (CDC), Change Tracking, custom polling or external connectors - SQL streams directly to configured destination. This means simpler security profile (fewer authentication points), fewer failure points, easier monitoring, lower skill bar to deploy and run the service. No need to worry about cleanup jobs, etc. SQL keeps track of which changes are successfully received by the destination, handles the retry logic and releases log truncation point. Finally, with CES you have fewer components to procure and get approved for production use. Decoupling: The integration is done on the database level. This eliminates the problem of dual writes - the changes are streamed at transaction boundaries, once your source of truth (the database) has saved the changes. You do not need to modify your app workloads to get the data streamed - you tap right onto the data layer - this is useful if your apps are dated and do not possess real-time integration capabilities. In case of some 3rd party apps, you may not even have an option to do anything other than database level integration, and CES makes it simpler. Also, the publishing database does not concern itself with the final destination for the data - Stream the data once to the common message bus, and you can consume it by multiple downstream systems, irrespective of their number or capacity - the (number of) consumers does not affect publishing load on the SQL side. Serving consumers is handled by the message bus, Azure Event Hubs, which is purpose built for high throughput data transfers. onceptually visualizing data flow from SQL Server, with an arrow towards Azure Event Hubs, from where a number of arrows point to different final destinations. Key Scenarios for CES Event-driven microservices: They need to exchange data, typically thru a common message bus. With CES, you can have automated data publishing from each of the microservices. This allows you to trigger business processes immediately when data changes. Real-time analytics: Stream operational data into platforms like Fabric Real Time Intelligence or Azure Stream Analytics for quick insights. Breaking down the monoliths: Typical monolithic systems with complex schemas, sitting on top of a single database can be broken down one piece at a time: create a new component (typically a microservice), set up the streaming from the relevant tables on the monolith database and tap into the stream by the new components. You can then test run the components, validate the results against the original monolith, and cutover when you build the confidence that the new component is stable. Cache and search index updates: Keep distributed caches and search indexes in sync without custom triggers. Data lake ingestion: Capture changes continuously into storage for incremental processing. Data availability: This is not a scenario per se, but the amount of data you can tap into for business process mining or intelligence in general goes up whenever you plug another database into the message bus. E.g. You plug in your eCommerce system to the message bus to integrate with Shipping providers, and consequently, the same data stream is immediately available for any other systems to tap into. How It Works CES uses transaction log-based capture to stream changes with minimal impact on your workload. Events are published in a structured JSON format following the CloudEvents standard, including operation type, primary key, and before/after values. You can configure CES to target Azure Event Hubs via AMQP or Kafka protocols. For details on configuration, message format, and FAQs, see the official documentation: Feature Overview CES: Frequently Asked Questions Get Started Public preview CES is available today in public preview for Azure SQL Database and as a preview feature in SQL Server 2025. [update 20-mar-2026] Change Event Streaming is now in public preview for Azure SQL Managed instance. Read more here. Private preview CES is also available as a private preview for Azure SQL Managed Instance and Fabric SQL database: you can request to join the private preview by signing up here: https://aka.ms/sql-ces-signup We encourage you to try the feature out and start building real-time integrations on top of your existing data. We welcome your feedback—please share your experience through Azure Feedback portal or support channels. The comments below on this blog post will also be monitored, if you want to engage with us. Finally, CES team can be reached via email: sqlcesfeedback [at] microsoft [dot] com. Useful resources Free Azure SQL Database. Free Azure SQL Managed Instance.1.3KViews0likes0CommentsAnnouncing Public Preview: Auditing for Fabric SQL Database
We’re excited to announce the public preview of Auditing for Fabric SQL Database—a powerful feature designed to help organizations strengthen security, ensure compliance, and gain deep operational insights into their data environments. Why Auditing Matters Auditing is a cornerstone of data governance. With Fabric SQL Database auditing, you can now easily track and log database activities—answering critical questions like who accessed what data, when, and how. This supports compliance requirements (such as HIPAA and SOX), enables robust threat detection, and provides a foundation for forensic investigations. Key Highlights Flexible Configuration: Choose from default “audit everything,” preconfigured scenarios (like permission changes, login attempts, data reads/writes, schema changes), or define custom action groups and predicate filters for advanced needs. Seamless Access: Audit logs are stored in One Lake, making them easily accessible via T-SQL or One Lake Explorer. Role-Based Access Control: Configuration and log access are governed by both Fabric workspace roles and SQL-level permissions, ensuring only authorized users can view or manage audit data. Retention Settings: Customize how long audit logs are retained to meet your organization’s policy. How It Works Audit logs are written to a secure, read-only folder in One Lake and can be queried using the sys. fn_get_audit_file_v2 T-SQL function. Workspace and artifact IDs are used as identifiers, ensuring logs remain consistent even if databases move across logical servers. Access controls at both the workspace and SQL database level ensure only the right people can configure or view audit logs. Example Use Cases Compliance Monitoring: Validate a full audit trail for regulatory requirements. Security Investigations: Track specific events like permission changes or failed login attempts. Operational Insights: Focus on specific operations (e.g., DML only) or test retention policies. Role-Based Access: Verify audit visibility across different user roles. Getting Started You can configure auditing directly from the Manage SQL Auditing blade in the Fabric Portal. Choose your preferred scenario, set retention, and (optionally) define custom filters—all through a simple, intuitive interface. Learn more about auditing for Fabric SQL database here Data exposed session with demo here281Views3likes1Comment