sql server
176 Topics3 Reasons Enterprise SQL Server Migrations Slow Down - and How to Avoid Them
Summary Many of Enterprises around the globe have relied on SQL Server for over 3 decades to run their mission critical business applications. Their SQL Server estates face pressure from downtime risk, cost volatility, end of support timelines and modernization demands. As these customers get ready to modernize their data to use the latest capabilities of A.I and cloud native application trends, they want to migrate and modernize their SQL Servers to use Azure SQL with a modernization strategy built on confidence of customer success. Enterprise migrations rarely fail because of migration tools. They slow down because organizations struggle to answer three questions: How much downtime can we tolerate? What will it cost after migration? Are we choosing the right target platform? The organizations that answer these questions early move faster and with less risk. For the DB Administrators, Data architects, application architect and cloud-cost decision makers there are important technical considerations before, during and after data modernization to avoid long term costs and operational concerns. The Microsoft SQL Team has helped many customers modernize their SQL. We discuss important guidelines that can help resolve the 3 major concerns that block or slow SQL Server migration and modernization in Enterprises. This is covered in the episode of DataExposed for which this companion blog goes into the details. What are important triggers that cause customers and partners to consider SQL modernization? There are many business triggers that force Enterprises to migrate their data to public cloud. As SQL Server 2012 to SQL Server 2016 are already in the end of support stage of their lifecycle, customers need to upgrade SQL Server in place or migrate to AzureSQL. Due to cyber security threats, customers are feeling more vulnerable to attackers. Moving their data into a secure environment is essential for protecting not just their data but their business. Customers are reporting the need to free up IT dollars to invest into other parts of the business that may need it more. These may be anything from datacenter contract expirations, need for Hardware refreshes to software license renewals. As the business grows or becomes cyclical, there is surge in demand. Capacity constraints become a barrier for such expansions. These are triggers that cause them to rethink their data modernization strategy. Data modernization and moving the data to a elastic, scalable, secure and resilient data platform such as Azure SQL, becomes essential. The Three Migration Blockers However, data modernization and migration is not without any risk. Based on our customers experience, here are three key reasons that we have commonly encountered that halt or slow down SQL modernization. 1. Downtime Risk Business stakeholders often require strict service level commitments before authorizing production cutovers. Even when migrations are technically feasible, organizations may delay projects if they believe downtime windows could impact revenue, customer experience, or regulatory obligations. Most customers are still offered offline migration paths which can take hours to days, even though zero-downtime migrations are possible which take seconds to minutes. 2. Cost uncertainty Many modernization projects are approved based on expected cost savings. However, if infrastructure sizing, licensing assumptions, storage consumption, or disaster recovery requirements are not evaluated properly, the actual operational cost can exceed initial expectations. Cost uncertainty often slows executive approval processes and extends migration timelines. 3. Compatibility and Feature Fit When migrating SQL Server, Azure SQL has several deployment offerings from IaaS to PaaS. These include SQL Server on Azure VM, Azure SQL Managed Instance, Azure SQL DB Hyperscale and Azure SQL in Microsoft Fabric. Many customers maybe using SQL Server features like Cross-database queries, CLR, SSIS, SQL Agent, and linked servers. They make a safe decision to lift and shift migrate to SQL Server on Azure VMs IaaS instead of modernizing to a PaaS service like Azure SQL Managed Instance. However, in the process, they lose the opportunity to use the PaaS capabilities, manageability and AI/Fabric capabilities in Azure by making this choice. Enterprise Architects, Application Architects, Database developers and DB Administrators have to make the right choice taking both development as well as operational costs and compatibility when they make their SQL modernization decisions. Here are best practices some of the biggest and successful SQL migrations have used to make the migration and modernization journey with confidence. While we cannot disclose specific customer names, these guidelines are based on helping many large to small Enterprise customers. Azure SQL Managed Instance as the Resiliency Anchor Azure SQL Managed Instance is often the platform that helps organizations overcome all three concerns simultaneously because it combines near-full SQL Server compatibility with platform-as-a-service benefits. Azure SQL Managed Instance (Azure SQL MI) Next-gen General Purpose is now generally available, bringing a built-in performance and scale upgrade for General Purpose workloads, including up to 500 databases per instance, up to 32 TB storage, lower latency, and higher IOPS. The release also adds more flexible cost-performance tuning with independent vCore, IOPS, and memory scaling, plus faster management operations to adapt to changing workload demand. For enterprise SQL Server modernization, this positions Azure SQL MI as a stronger path for high-compatibility migrations that need better price-performance without moving to a full replatform. Let us dive deeper into how this helps address the downtime risk concerns by enables three levels of resiliency and high availability features. Local Redundancy Azure SQL Managed Instance provides first layer of Local Redundancy — built into every Azure SQL MI instance at no extra cost. Azure SQL Managed Instance uses local redundancy by default to keep workloads available during node, VM, rack, maintenance, and other local failures within a single datacenter, with Service Fabric orchestrating failover. In General Purpose (including Next-gen GP), this is implemented as stateless compute plus remote stateful storage; during failover, the engine process moves to another compute node and reattaches data, which can cause temporary performance impact due to cold cache. In Business Critical, local redundancy uses multiple synchronized replicas with local SSD storage (Always On-like architecture), enabling fast failover and read scale-out on secondaries.Next-gen General Purpose is an architectural upgrade to the existing General Purpose service tier that uses an upgraded remote storage layer that stores instance data and log files on Elastic SAN instead of page blobs and maintains it locally. Local redundancy protects against local infrastructure issues. This gives you a 99.99% SLA but not full datacenter/zone disasters, so zone redundancy (where supported) or disaster recovery (DR) options like failover groups/geo-restore are needed for broader resilience. Zone Redundancy The second layer is Zone Redundancy, which is accomplished placing data replicas across availability zones. Your Azure SQL MI resources are distributed across multiple availability zones within a region. This protects against the failure of an entire datacenter because each Azure availability zone is a separate physical location with independent power, cooling and networking. It relies on synchronous replication using zone-redundant storage for General Purpose. For Business critical, it uses Always On Availability group replicas across zones for Business Critical. Always On availability group technology replicates data changes from the primary instance to standby replicas in other availability zones. In the event of an outage, there's an automatic failover that seamlessly transitions one of the standby replicas to be prima. These replicas are always in sync — which means zero data loss. Failover typically happens in under 30 seconds, and your SLA jumps to 99.995%. Failover Groups The third layer is Failover Groups. This is your cross-region disaster recovery solution. It asynchronously replicates all user databases to a secondary Azure SQL MI instance in a different Azure region. Because it is asynchronous replication, there is potential for momentary data loss in the case of a datacenter outage. But it still protects the data against the worst case failure — a full regional outage. If the replica is a standby replica, there is no license required and it is used only for disaster recovery. Using these options, business stakeholders can get their assurance that they have Enterprise grade availability and resiliency platform of AzureSQL for running their mission critical workloads. You can read more about these HA and Resiliency options in Microsoft Learn. Cost Governance for Enterprise Buyers The total cost of data modernization and migration is not a one-time estimate but an ongoing one. In this case, Azure SQL MI provides Enterprise DB Administrators many levers through pricing model choice, right-sizing, elasticity, serverless options and dev/test free tiers. Let us explore how these can be combined for smart cost estimations. Lets also look at the best offering for the cost-conscious Enterprises - Azure SQL DB Hyperscale. With Azure SQL DB Hyperscale, you get the SQL Server engine, T-SQL compatibility, High Availability, Disaster recovery, security, backups, and management all bundled into the service price. No separate cost for SQL Server license. Hyperscale separates compute and storage that can scale independently and does not force you to overprovision. You have to only pay what you use which is ideal for seasonal workloads, Dev/Test, SaaS applications, predictable daytime trends, and up to 60% savings when you use Elastic pools. Azure Hybrid benefit (AHB)- Azure Hybrid Benefit lets you bring your existing SQL Server investments to Azure and reduce compute costs, accelerating your ROI from cloud migration while preserving all the benefits of Azure SQL Azure SQL DB Free offer – is the strongest product offering. Enterprises can use all features of Azure SQL at no cost for up to 10 Azure SQL DB free-tier. 100,000 vCore-seconds of serverless compute per month, 32GB data storage, 32 GB backup storage, serverless auto-scaling and auto-pause if you hit the limit per month. Run your POCs at no cost and evaluate before you move to Azure SQLDB, especially SMB& some enterprise Azure SQL Managed Instance also offers 1 free Azure SQL MI instance per Azure subscription giving you 720vCore hours per month, 64GB storage, up to 500 databases, automated backups and 12 months free. And if data migration is not possible due to data compliance or data proximity purposes, Azure Arc Pay-As-You-Go (PAYG) gives you cloud-style SQL licensing for servers running anywhere—on-premises, at the edge, or in other clouds. Instead of making large up-front licensing investments, you only pay for SQL Server while it's running, while still gaining access to Azure Arc management, security, monitoring, and modernization capabilities. For seasonal, variable, or growth-oriented workloads, PAYG can improve cash flow and reduce licensing complexity. Reserved instances allow Enterprise customers to commit to using Azure SQL resource for a period of one or three years to receive a significant discount. This option combined with AHB can save you even more up to 80%. We have a comprehensive licensing guide for on-premises SQL Server for your reference. Azure SQL enables a variety of cloud cost-models for a wide range of enterprise workload needs to help Enterprise cloud cost decision makers and DB Administrators make the right choice for their workloads. Target selection guidance While Azure SQL has multiple deployment options to migrate your on-premises work loads, it is critical to make the right choice long term. Customers can install SQL Server on-premises, they can use Azure SQL deployment options, and also run SQL Server in other clouds like Amazon Web Services and Google Cloud. If there is an Enterprise workload that is not ready to modernize, you have the ability to lift and shift into SQL Server in Azure VM. It is a low cost migration option, because the application does not need any modification and it gives DB Administrators full control over the SQL server and underlying Windows or Linux OS. This can be a first step to modernization for some customers who are risk-averse. For those Enterprise customers who are willing to modernize their workloads and SQL Server instances, Azure SQL DB Hyperscale is the best option. Azure SQL Database Hyperscale helps organizations modernize their most demanding database workloads with virtually unlimited growth, high performance, and cloud-scale economics. Customers can scale storage and compute independently, support large multi-terabyte databases, accelerate application performance with read-scale replicas, and eliminate the operational complexity of managing infrastructure, backups, patching, and high availability. They can build cloud-native applications or cloud-enable existing applications. However, if Enterprise customers want good compatibility with their on-premises SQL Server but continue down the modernization path - their best option is Azure SQL Managed Instance. They can modernize the instance and not impact the application as there is no application change required. Applications will continue to work and the DB Administrators do not need to worry about managing infrastructure and all the overhead that comes with managing, self-managing your SQL Server virtual machines. For SQL Server customers, PostgreSQL may look like an attractive low cost option. However, it requires re-platforming that could add significant hidden cost due to retraining all their DBAs and their developers to do performance optimization, performance best practices and operational maintenance. Lastly, our same SQL engine is also available to customers as a SaaS-ified version, Fabric SQL database as well. All these options use the exact same SQL engine which makes it easier for Database developers and DB Administrators continue to use the same expertise, tools and process. Making the right choice of Azure SQL deployment is not just on the fastest way to modernize but the right long term approach. Conclusion and Next steps Enterprise SQL Server migrations rarely stall because of migration technology. More often, they are delayed by concerns around downtime, cost predictability, and platform selection. Organizations that address these questions early can accelerate modernization while reducing operational risk. Azure SQL provides multiple modernization paths—from SQL Server on Azure Virtual Machines to Azure SQL Managed Instance and Azure SQL Database—allowing organizations to balance compatibility, operational simplicity, resiliency, and cost efficiency based on their business requirements. As modernization initiatives accelerate, the most successful projects are those that treat migration not as a one-time infrastructure event, but as a long-term platform strategy. Whether its the newest and the fastest way for us to migrate customers, we have all the comprehensive Copilot enabled AI-assisted migration tooling, technical training and support you need. Look for more blogs, whitepapers, guides and training based on best practices used real-world data modernization projects.78Views0likes0CommentsRegex support for LOB types in T-SQL—available in Azure SQL & SQL Server 2025
At a glance — Native regular expression (regex) functions in T-SQL now accept varchar(max) and nvarchar(max) inputs of up to 2 MB across all seven regex functions, including the two table-valued functions (REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE). This capability ships in SQL Server 2025 CU5 and is already available in Azure SQL Database, SQL Database in Fabric and Azure SQL Managed Instance configured with the Always-up-to-date update policy. It will reach Managed Instances on the SQL Server 2025 update policy as part of the CU5 rollout. You no longer need to split log files, HTML documents, or large JSON payloads into 8,000-byte chunks just to run a pattern match. 1. Introduction Regular expressions have long been a cornerstone of modern data processing — used for validation, parsing, transformation, and extracting structured insights from unstructured text. With SQL Server 2025 and Azure SQL, regex is now a first-class T-SQL capability, removing the historical need to rely on SQLCLR functions or application-tier processing. While the initial release made native regex broadly available, large-object (LOB) inputs were not yet supported on every function. CU5 closes that gap. Under the hood, T-SQL regex implements POSIX Extended Regular Expression (ERE) semantics, augmented by a curated set of Perl-style features, and is powered by the RE2 engine. RE2 is a linear-time, non-backtracking implementation, which means it is not susceptible to catastrophic backtracking (a class of denial-of-service issue commonly known as ReDoS). That guarantee becomes far more important when the input is a 1.8 MB log blob than when it is an 8,000-byte string. Release timeline Milestone What shipped Ignite 2025 — General Availability Regex went GA in SQL Server 2025 and Azure SQL. LOB inputs were initially supported only on REGEXP_LIKE, REGEXP_COUNT, and REGEXP_INSTR. LOB support on REGEXP_REPLACE and REGEXP_SUBSTR was deferred, and the two table-valued functions (TVFs) accepted only non-LOB string types. Azure SQL (post-GA service updates) LOB inputs enabled across all seven functions. SQL Server 2025 CU5 LOB inputs up to 2 MB enabled on all seven functions in the SQL Server. What’s new in CU5 varchar(max) and nvarchar(max) inputs are accepted on every regex function. The input string is capped at 2 MB per function call. The pattern is still capped at 8,000 bytes, which is far larger than any maintainable regular expression should ever need. Behavior is consistent between Azure SQL and SQL Server, so code you write today is fully portable. Note — The 2 MB limit applies to the input passed to a single function call, not to the column or row. A single value in a varchar(max) column can still store up to 2 GB; the constraint is that no single regex evaluation can consume more than 2 MB of that value. Prerequisites SQL Server 2025 CU5 or later, or Azure SQL Database, or SQL Database in Fabric or Azure SQL Managed Instance configured with the SQL Server 2025 / Always-up-to-date update policy. The two table-valued functions (REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE) require database compatibility level 170, unless the database-scoped configuration ALLOW_BUILTIN_TVF_IN_ALL_COMPAT_LEVELS (preview) is enabled. Note — On Azure SQL Managed Instance (Always-up-to-date), this capability is rolling out region by region. It is already live in regions where the rollout has completed and will light up in the remaining regions as the deployment finishes. Instances on the SQL Server 2025 update policy will receive it as part of the CU5 rollout — coming soon. Verify compatibility level (170 required for the TVFs) – SELECT name, compatibility_level FROM sys.databases WHERE name = DB_NAME(); -- If necessary: -- ALTER DATABASE [<your-database>] SET COMPATIBILITY_LEVEL = 170; 2. Working with LOB Data This section demonstrates the CU5 capabilities against a realistic LOB data. We build a LogEntries table whose RawPayload column holds multi-KB to multi-MB chunks of web server and application output, plus an HtmlPages table for HTML cleansing examples. 2.1 Create the sample schema and data IF OBJECT_ID('dbo.LogEntries', 'U') IS NOT NULL DROP TABLE dbo.LogEntries; IF OBJECT_ID('dbo.HtmlPages', 'U') IS NOT NULL DROP TABLE dbo.HtmlPages; CREATE TABLE dbo.LogEntries ( LogId BIGINT IDENTITY(1,1) PRIMARY KEY, Source SYSNAME NOT NULL, IngestedAt DATETIME2(3) NOT NULL DEFAULT SYSUTCDATETIME(), RawPayload VARCHAR(MAX) NOT NULL -- LOB column ); CREATE TABLE dbo.HtmlPages ( PageId INT IDENTITY(1,1) PRIMARY KEY, Url NVARCHAR(2048) NOT NULL, Body NVARCHAR(MAX) NOT NULL -- LOB column (Unicode) ); Now generate realistically large rows. The REPLICATE(CAST(... AS varchar(max)), n) pattern is required because REPLICATE returns NULL when the result would exceed 8,000 bytes unless its first argument is a max type. -- Synthetic web access-log payload (~252 KB in row 1, plus a separate ~586 KB row). DECLARE @logLine VARCHAR(500) = '127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200 1532 ' + 'user-agent="Mozilla/5.0" ip=10.0.0.7 email=alice@contoso.com card=4111-1111-1111-1234' + CHAR(10); DECLARE @bigLog VARCHAR(MAX) = REPLICATE(CAST(@logLine AS VARCHAR(MAX)), 1500) -- ~252 KB + '127.0.0.1 - mallory [21/May/2026:10:16:01 +0000] "POST /login HTTP/1.1" 500 0 ' + 'ip=203.0.113.99 ssn=123-45-6789' + CHAR(10); INSERT INTO dbo.LogEntries (Source, RawPayload) VALUES ('web-01', @bigLog), -- ~252 KB ('web-02', REPLICATE(CAST('OK ' AS VARCHAR(MAX)), 200000)); -- ~586 KB -- Synthetic HTML page (~775 KB / ~396,000 characters). DECLARE @htmlChunk NVARCHAR(MAX) = N'<div class="row"><p>Hello <b>world</b>! Contact <a href="mailto:bob@contoso.com">bob</a>.</p></div>'; INSERT INTO dbo.HtmlPages (Url, Body) VALUES (N'https://contoso.example/page-1', N'<html><head><title>Big Page</title></head><body>' + REPLICATE(@htmlChunk, 4000) + N'</body></html>'); -- Confirm payload sizes in bytes. SELECT LogId, Source, DATALENGTH(RawPayload) AS PayloadBytes FROM dbo.LogEntries; SELECT PageId, DATALENGTH(Body) AS BodyBytes, LEN(Body) AS BodyChars FROM dbo.HtmlPages; Results: LogId Source PayloadBytes 1 web-01 258,110 2 web-02 600,000 PageId BodyBytes BodyChars 1 792,124 396,062 Before CU5, feeding any of these payloads into REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_MATCHES, or REGEXP_SPLIT_TO_TABLE would have failed with a type-mismatch error or required a LEFT(RawPayload, 8000)-style truncation. The same queries now run end-to-end. 2.2 REGEXP_LIKE — Filter rows by LOB content -- Find logs that contain at least one HTTP 5xx response. SELECT LogId, Source, DATALENGTH(RawPayload) AS PayloadBytes FROM dbo.LogEntries WHERE REGEXP_LIKE(RawPayload, '"[A-Z]+\s[^"]+\sHTTP/1\.[01]"\s5[0-9]{2}\s'); REGEXP_LIKE is a Boolean predicate: it evaluates to true when the pattern matches anywhere in the input and false otherwise. Because it returns a Boolean rather than a bit, use it directly in WHERE, CASE WHEN, IIF, or CHECK constraint contexts — do not compare it with = 1 or = 0 (the parser rejects that syntax). Note — REGEXP_LIKE itself requires database compatibility level 170. The other scalar regex functions (REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR) are available at all compatibility levels. Results: LogId Source PayloadBytes 1 web-01 258,110 2.3 REGEXP_COUNT — Counting at scale -- Per-row tally of GET requests, POST requests, and 5xx responses -- across the entire LOB payload. SELECT LogId, Source, REGEXP_COUNT(RawPayload, '"GET\s') AS Gets, REGEXP_COUNT(RawPayload, '"POST\s') AS Posts, REGEXP_COUNT(RawPayload, '\s5[0-9]{2}\s') AS ServerErrors FROM dbo.LogEntries; Results: LogId Source Gets Posts ServerErrors 1 web-01 1,500 1 1 2 web-02 0 0 0 2.4 REGEXP_INSTR — Locate the first error -- 1-based character position (or 0 if no match) of the FIRST 5xx response in each payload. SELECT LogId, Source, REGEXP_INSTR(RawPayload, '\s5[0-9]{2}\s', 1, 1, 0) AS FirstErrorPos FROM dbo.LogEntries; Parameter recap: REGEXP_INSTR(string, pattern, start, occurrence, return_option [, flags [, group ]]). A return_option of 0 returns the starting position of the match; 1 returns the position immediately after the last character of the match. Results: LogId Source FirstErrorPos 1 web-01 258,072 2 web-02 0 2.5 REGEXP_REPLACE — Redact sensitive data in place PII redaction over LOB payloads was one of the most-requested CU5 scenarios. Before CU5, it required a custom chunked-replace routine; it is now a single expression. -- Redact credit-card-shaped tokens, U.S. SSN-shaped tokens, and email addresses -- across the entire payload. SELECT LogId, REGEXP_REPLACE( REGEXP_REPLACE( REGEXP_REPLACE( RawPayload, '\b[0-9]{4}[- ]?[0-9]{4}[- ]?[0-9]{4}[- ]?[0-9]{4}\b', '****-****-****-****'), '\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b', '***-**-****'), '\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b', '[redacted-email]' ) AS RedactedPayload FROM dbo.LogEntries; Or strip every HTML tag from an nvarchar(max) page in a single call: SELECT PageId, LEN(Body) AS OriginalLen, LEN(REGEXP_REPLACE(Body, N'<[^>]+>', N'')) AS TextOnlyLen FROM dbo.HtmlPages; Results — the ~775 KB HTML document collapses from 396,062 to 100,008 characters of plain text in a single call: PageId OriginalLen TextOnlyLen 1 396,062 100,008 2.6 REGEXP_SUBSTR — Extract a single value -- Pull the first IPv4 address out of each log payload. SELECT LogId, REGEXP_SUBSTR(RawPayload, '\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b', 1, -- start position 1, -- occurrence 'c', -- flags: case-sensitive 0 -- group: 0 returns the whole match ) AS FirstIp FROM dbo.LogEntries; To return the contents of a specific capture group instead of the entire match, pass its 1-based group number as the final argument. Results: LogId FirstIp 1 127.0.0.1 2 NULL 2.7 REGEXP_MATCHES — Every match, set-based This is where the combination of TVF and LOB delivers the largest productivity gain: extract every structured value from a megabyte of unstructured text in a single set-based query, with no client round-trips. REGEXP_MATCHES returns one row per match with these columns: Column Type Description match_id bigint Sequence number of the match (1-based). start_position int 1-based start index of the match. end_position int 1-based end index of the match. match_value same type as string_expression The entire matched substring. substring_matches json JSON array describing each capture group, with the shape [{"value":"…","start":N,"length":N}, …]. -- Every email address in every log payload, alongside its row of origin. SELECT l.LogId, m.match_id, m.match_value AS EmailFound FROM dbo.LogEntries AS l CROSS APPLY REGEXP_MATCHES( l.RawPayload, '\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b' ) AS m ORDER BY l.LogId, m.match_id; Capture groups are even more useful — you can project the parts of every log line as columns by reading from the substring_matches JSON document: -- Parse Common-Log-Format-ish entries into ip, user, status, and bytes columns. -- The pattern has four capture groups, accessed below as $[0] through $[3]. SELECT l.LogId, m.match_id, JSON_VALUE(m.substring_matches, '$[0].value') AS Ip, JSON_VALUE(m.substring_matches, '$[1].value') AS UserName, JSON_VALUE(m.substring_matches, '$[2].value') AS Status, JSON_VALUE(m.substring_matches, '$[3].value') AS Bytes FROM dbo.LogEntries AS l CROSS APPLY REGEXP_MATCHES( l.RawPayload, '^([0-9.]+)\s-\s(\S+)\s\[[^\]]+\]\s"[^"]+"\s([0-9]{3})\s([0-9]+)', 'm' -- multi-line: ^ and $ anchor to each line, not just the whole input ) AS m ORDER BY l.LogId, m.match_id; Important — Without the 'm' flag, the ^ anchor matches only at the start of the entire 250 KB input, so you would receive exactly one match for the first line. The multi-line flag is what unlocks per-line extraction. Results (first two parsed rows): LogId match_id Ip UserName Status Bytes 1 1 127.0.0.1 alice 200 1532 1 2 127.0.0.1 alice 200 1532 2.8 REGEXP_SPLIT_TO_TABLE — Shred a LOB into rows -- Project the entire log payload as one row per non-empty line. SELECT l.LogId, s.ordinal AS [LineNo], s.value AS LineText FROM dbo.LogEntries AS l CROSS APPLY REGEXP_SPLIT_TO_TABLE(l.RawPayload, '\r?\n') AS s WHERE l.LogId = 1 AND s.value <> '' ORDER BY s.ordinal; You now have a tabular projection of a multi-megabyte text blob without leaving the engine. You can feed it into a CTE, aggregate it, join it to dimension tables, or materialize it into a staging table — all set-based. Results (first three rows): LogId ordinal LineText (first 80 chars) 1 1 127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200 1 2 127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200 1 3 127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200 Tip — composing LOB regex pipelines — CROSS APPLY (and OUTER APPLY when you need to preserve rows that produce no matches) is the primary composition primitive. You can stack REGEXP_SPLIT_TO_TABLE (lines) feeding REGEXP_MATCHES (fields per line) feeding ordinary aggregates, all within a single query plan. 2.9 The 2 MB ceiling — strategies for larger inputs The 2 MB limit applies to the input string of a single regex call. If the value passed to a regex function exceeds 2 MB, the call raises an error (error number 19311, severity 16) rather than silently truncating. That is the intended behavior — silent truncation would hide correctness bugs. In practice, 2 MB is a generous ceiling: a single log file or HTML document of that size is already unusual, and most real-world LOB data sit comfortably below it. When individual values do exceed the limit, the most reliable approach is to split them into smaller logical units before they land in the column you want to query — for example, by writing one log line, one document section, or one record per row at ingestion time. Because every regex function (including the two TVFs) shares the same 2 MB ceiling, sharding at query time is not generally feasible; doing it at the load path keeps every regex call well under the limit and avoids per-query workarounds. Bytes vs. characters — The 2 MB limit is measured in bytes, not characters, and the byte count is based on the UTF-8 encoding of the input regardless of the column’s declared type. ASCII characters take 1 byte each, so plain ASCII text can run to roughly two million characters; non-ASCII characters take 2–4 bytes in UTF-8, so fewer characters fit. Keep in mind that DATALENGTH() reports storage size in the column’s own encoding, which may differ from the UTF-8 byte count used by the limit, and LEN() (which counts characters) is best avoided as a sizing check here. To measure the UTF-8 byte length that the limit actually checks, cast the value to varchar(max) under a UTF-8 collation and take its DATALENGTH: SELECT DATALENGTH( CONVERT(varchar(max), Body COLLATE Latin1_General_100_CI_AS_SC_UTF8) ) AS Utf8Bytes FROM dbo.HtmlPages; Anything above 2 * 1024 * 1024 (2,097,152) bytes will be rejected by a regex call on that value. Have a scenario that genuinely needs more than 2 MB? If your workload requires regex evaluation on individual values larger than the current 2 MB ceiling, we would like to hear about it. Please share the details — data shape, payload size, pattern, and business need — on the Azure SQL feedback portal. Customer feedback directly informs how we prioritize future limit changes. 2.10 Cleanup DROP TABLE IF EXISTS dbo.LogEntries; DROP TABLE IF EXISTS dbo.HtmlPages; 3. Summary What changed in CU5 Before CU5 — LOB inputs were accepted on REGEXP_LIKE, REGEXP_COUNT, and REGEXP_INSTR. The remaining functions — REGEXP_REPLACE, REGEXP_SUBSTR, and the two TVFs (REGEXP_MATCHES, REGEXP_SPLIT_TO_TABLE) — required non-LOB string inputs, which often meant truncating with LEFT(..., 8000) or chunking in the application tier. After CU5 (and already in Azure SQL) — All seven functions accept varchar(max) and nvarchar(max) inputs of up to 2 MB. The pattern remains capped at 8,000 bytes. Quick reference Function Returns LOB input (CU5) Common use case REGEXP_LIKE Boolean (predicate) Yes Filter rows in WHERE / CASE / CHECK predicates REGEXP_COUNT int Yes Count occurrences of a pattern REGEXP_INSTR int Yes Position of the nth match REGEXP_REPLACE string Yes Redact, cleanse, or normalize text REGEXP_SUBSTR string Yes Extract a single value REGEXP_MATCHES (TVF) (match_id, start_position, end_position, match_value, substring_matches) Yes Extract every match plus capture groups (via JSON), set-based REGEXP_SPLIT_TO_TABLE (TVF) (value, ordinal) Yes Split a LOB into rows by a regex delimiter Further reading Official documentation: REGEXP_LIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_MATCHES, REGEXP_SPLIT_TO_TABLE. Regular expressions overview. SQL Server 2025 CU5 release notes. Closing thought. Native regex was already a significant quality-of-life improvement when it became generally available. CU5 completes the picture: every function, every input size up to 2 MB, every shape — scalar or table-valued. The next time you are tempted to export a column out of the database in order to grep it, try one of the seven regex functions first. Happy matching. 🧠210Views0likes0CommentsDynamic Data Masking – What it is, What it isn’t, and How to use it effectively
In this post, we’ll explain the core purpose of Dynamic Data Masking (to ease application development), how it works, and its proper use cases – as well as its limitations. If you’re considering using Dynamic Data Masking or reviewing your data security strategy, this information will help you make informed decisions. What Dynamic Data Masking is designed for Dynamic Data Masking Dynamic Data Masking - SQL Server | Microsoft Learn is a database feature that can be used to alter how certain data elements are presented in query results for users who do not have privileged access or required permission. For example, a query on an email column may return a masked value such as jXXX@XXXX.com rather than the full address, depending on user permissions, while the original data remains unchanged in storage. Masking rules are defined within the database schema and are applied to query results for applicable users at runtime. This approach can simplify application developer’s job and reduce the need for application‑level logic that modifies how sensitive values are displayed across different application(s) or reports. DDM can help prevent accidental or casual exposure of sensitive information. How Does DDM differ from other security features? Dynamic Data Masking affects only what users see in query results—it does not protect the underlying data. Unlike encryption Always Encrypted - SQL Server | Microsoft Learn or Row‑Level security Row-Level Security - SQL Server | Microsoft Learn, DDM does not encrypt data, filter rows, or override SQL permissions. Users with elevated privileges (such as UNMASK, db_owner, or sysadmin) always see unmasked data or can modify or remove masking rules. What DDM doesn’t protect against Because Dynamic Data Masking is applied when query results are returned, there are several considerations to be aware of: Inference through queries: In some scenarios, users with database access may be able to make inferences about masked values by applying query filters or conditions that rely on underlying stored data. The database is still comparing the real values under the hood, so these queries work. It’s an expected behavior given DDM’s design. Privileged users: Users who are granted sufficient database permissions, such as the ability to alter table schemas, can directly disable or remove masking. Users with sysadmin, db_owner or CONTROL permission can view unmasked data. Thus, controlling and auditing who holds such privileges is vital. Metadata visibility: Masking rules and associated columns can be discoverable through system metadata. Data movement: Because masking is defined at the schema level in a given database instance, backups or exported datasets may contain unmasked values depending on permissions and configuration. Understanding these design characteristics is important when incorporating DDM into a broader data governance or privacy strategy. Proper use and best practices for DDM Organizations may consider using Dynamic Data Masking in scenarios where consistent display of sensitive values is needed across application(s) or reporting environments. Some implementation considerations include: Using DDM to help standardize how sensitive fields are displayed in query results and reduce developmental effort for data masking Combining DDM with other database or access‑control features as part of a layered data protection strategy Reviewing which users are granted permissions to view unmask data or alter masking configurations. Implementing auditing or monitoring database activity as part of broader governance practices Educating internal stakeholders on how masking operates at the query‑result level Testing masking configurations in non‑production environments prior to deployment Conclusion Dynamic Data Masking can be useful in scenarios where organizations want to manage how sensitive data is displayed in application outputs without modifying stored values. It is designed to operate as part of a broader data access or governance approach rather than as a standalone protection mechanism for stored data. When implemented alongside complementary database features and appropriate access controls, DDM may help support more consistent handling of sensitive values across environments.324Views0likes0CommentsZero Trust for data: Make Microsoft Entra authentication for SQL your policy baseline
A policy-driven path from enabled to enforced. Why this matters now Security and compliance programs were once built on an assumption that internal networks were inherently safer. Cloud adoption, remote work, and supply-chain compromise have steadily invalidated that model. U.S. federal guidance has now formalized this shift: Executive Order 14028 calls for modernizing cybersecurity and accelerating Zero Trust adoption, and OMB Memorandum M-22-09 sets a federal Zero Trust strategy with specific objectives and timelines. Meanwhile, attacker economics are changing. Automation and AI make reconnaissance, phishing, and credential abuse cheaper and faster. That concentrates risk on identity—the control plane that sits in front of systems, applications, and data. In Zero Trust, the question is no longer “is the network trusted,” but “is this request verified, governed by policy, and least-privilege?” Why database authentication is a first‑order Zero Trust control Databases are universally treated as crown-jewel infrastructure. Yet many data estates still rely on legacy patterns: password-based SQL authentication, long-lived secrets embedded in apps, and shared administrative accounts that persist because migration feels risky. This is exactly the kind of implicit trust Zero Trust architectures aim to remove. NIST SP 800-207 defines Zero Trust as eliminating implicit trust based solely on network location or ownership and focusing controls on protecting resources. In that model, every new database connection is not “plumbing”—it is an access decision to sensitive data. If the authentication mechanism sits outside the enterprise identity plane, governance becomes fragmented and policy enforcement becomes inconsistent. What changes when SQL uses Microsoft Entra authentication Microsoft Entra authentication enables users and applications to connect to SQL using enterprise identities, instead of usernames and passwords. Across Azure SQL and SQL Server enabled by Azure Arc, Entra-based authentication helps align database access with the same identity controls organizations use elsewhere. The security and compliance outcomes that leaders care about Reduce password and secret risk: move away from static passwords and embedded credentials. Centralize governance: bring database access under the same identity policies, access reviews, and lifecycle controls used across the enterprise. Improve auditability: tie access to enterprise identities and create a consistent control surface for reporting. Enable policy enforcement at scale: move from “configured” controls to “enforced” controls through governance and tooling. This is why Entra authentication is a high-ROI modernization step: it collapses multiple security and operational objectives into one effort (identity modernization) rather than a set of ongoing compensating programs (password rotation programs, bespoke exceptions, and perpetual secret hygiene projects). Why AI makes this a high priority decision AI accelerates both reconnaissance and credential abuse, which concentrates risk on identity. As a result, policy makers increasingly treat phishing-resistant authentication and centralized identity enforcement as foundational—not optional. A practical path: from enabled to enforced Successful security programs define a clear end state, a measurable glide path, and an enforcement model. A pragmatic approach to modernizing SQL access typically includes: Discover active usage: Identify which logins and users are actively connecting and which are no longer required. Establish Entra as the identity authority: Enable Entra authentication on SQL logical servers, starting in mixed mode to reduce disruption. Recreate principals using Entra identities: Replace SQL Authentication logins/users with Entra users, groups, service principals, and managed identities. Modernize application connectivity: Update drivers and connection patterns to use Entra-based authentication and managed identities. Validate, then enforce: Confirm the absence of password‑based SQL authentication traffic, then move to Entra‑only where available and enforce via policy. By adopting this sequencing, organizations can mitigate risks at an early stage and postpone enforcement until the validation process concludes. For a comprehensive migration strategy, refer to Securing Azure SQL Database with Microsoft Entra Password-less Authentication: Migration Guide. Choosing which projects to fund — and which ones to stop When making investment decisions, priority is given to database identity projects that can demonstrate clear risk reduction and lasting security benefits. Microsoft Entra authentication as the default for new SQL workloads, with a defined migration path for the existing workloads. Managed identities for application-to-database connectivity to eliminate stored secrets. Centralized governance for privileged database access using enterprise identity controls. At the same time, organizations should explicitly de-prioritize investments that perpetuate password risk: password rotation projects that preserve SQL Authentication, bespoke scripts maintaining shared logins, and exception processes that do not scale. Security and scale are not competing goals Security is often seen as something that slows down innovation, but database identity offers unique benefits. When enterprise identity is used for access controls, bringing in new applications and users shifts from handing out credentials to overseeing policies. Compliance reporting also becomes uniform rather than customized, making it easier to grow consistently thanks to a single control framework. Modern database authentication is not solely about mitigating risk— it establishes a scalable operational framework for secure data access. A scorecard designed for leadership readiness To elevate the conversation from implementation to governance, use outcome-based metrics: Coverage: Percentage of SQL workloads with Entra authentication enabled. Enforcement: Percentage operating in Entra-only mode after validation. Secret reduction: Applications still relying on stored database passwords. Privilege hygiene: Admin access governed through enterprise identity controls. Audit evidence: Ability to produce identity-backed access reports on demand. These map directly to Zero Trust maturity expectations and provide a defensible definition of “done.” Closing Zero Trust is an operating posture, not a single control. For most organizations, the fastest way to make that posture measurable is to standardize database access on the same identity plane used everywhere else. If you are looking for a single investment that improves security, reduces audit friction, and supports responsible AI adoption, modernizing SQL access with Microsoft Entra authentication — and driving it from enabled to enforced — is one of the most durable choices you can make. References US Government sets forth Zero Trust architecture strategy and requirements (Microsoft Security Blog) Securing Azure SQL Database with Microsoft Entra Password-less Authentication: Migration Guide (Microsoft Tech Community) OMB Memorandum M-22-09: Federal Zero Trust Strategy (White House) NIST SP 800-207: Zero Trust Architecture CISA: Zero Trust Enforce Microsoft Entra-only authentication for Azure SQL Database and Azure SQL Managed Instance487Views1like0CommentsWhy Developers and DBAs love SQL’s Dynamic Data Masking (Series-Part 1)
Dynamic Data Masking (DDM) is one of those SQL features (available in SQL Server, Azure SQL DB, Azure SQL MI, SQL Database in Microsoft Fabric) that both developers and DBAs can rally behind. Why? Because it delivers a simple, built-in way to protect sensitive data—like phone numbers, emails, or IDs—without rewriting application logic or duplicating security rules across layers. With just a single line of T-SQL, you can configure masking directly at the column level, ensuring that non-privileged users see only obfuscated values while privileged users retain full access. This not only streamlines development but also supports compliance with data privacy regulations like GDPR and HIPAA, etc. by minimizing exposure to personally identifiable information (PII). In this first post of our DDM series, we’ll walk through a real-world scenario using the default masking function to show how easy it is to implement and how much development effort it can save. Scenario: Hiding customer phone numbers from support queries Imagine you have a support application where agents can look up customer profiles. They need to know if a phone number exists for the customer but shouldn’t see the actual digits for privacy. In a traditional approach, a developer might implement custom logic in the app (or a SQL view) to replace phone numbers with placeholders like “XXXX” for non-privileged users. This adds complexity and duplicate logic across the app. With DDM’s default masking, the database can handle this automatically. By applying a mask to the phone number column, any query by a non-privileged user will return a generic masked value (e.g. “XXXX”) instead of the real number. The support agent gets the information they need (that a number is on file) without revealing the actual phone number, and the developer writes zero masking code in the app. This not only simplifies the application codebase but also ensures consistent data protection across all query access paths. As Microsoft’s documentation puts it, DDM lets you control how much sensitive data to reveal “with minimal effect on the application layer” – exactly what our scenario achieves. Using the ‘Default’ Mask in T-SQL : The ‘Default’ masking function is the simplest mask: it fully replaces the actual value with a fixed default based on data type. For text data, that default is XXXX. Let’s apply this to our phone number example. The following T-SQL snippet works in Azure SQL Database, Azure SQL MI and SQL Server: SQL -- Step 1: Create the table with a default mask on the Phone column CREATE TABLE SupportCustomers ( CustomerID INT PRIMARY KEY, Name NVARCHAR(100), Phone NVARCHAR(15) MASKED WITH (FUNCTION = 'default()') -- Apply default masking ); GO -- Step 2: Insert sample data INSERT INTO SupportCustomers (CustomerID, Name, Phone) VALUES (1, 'Alice Johnson', '222-555-1234'); GO -- Step 3: Create a non-privileged user (no login for simplicity) CREATE USER SupportAgent WITHOUT LOGIN; GO -- Step 4: Grant SELECT permission on the table to the user GRANT SELECT ON SupportCustomers TO SupportAgent; GO -- Step 5: Execute a SELECT as the non-privileged user EXECUTE AS USER = 'SupportAgent'; SELECT Name, Phone FROM SupportCustomers WHERE CustomerID = 1 Alternatively, you can use Azure Portal to configure masking as shown in the following screenshot: Expected result: The query above would return Alice’s name and a masked phone number. Instead of seeing 222-555-1234, the Phone column would show XXXX. Alice’s actual number remains safely stored in the database, but it’s dynamically obscured for the support agent’s query. Meanwhile, privileged users such as administrator or db_owner which has CONTROL permission on the database or user with proper UNMASK permission would see the real phone number when running the same query. How this helps Developers : By pushing the masking logic down to the database, developers and DBAs avoid writing repetitive masking code in every app or report that touches this data. In our scenario, without DDM you might implement a check in the application like: If user_role == “Support”, then show “XXXX” for phone number, else show full phone. With DDM, such conditional code isn’t needed – the database takes care of it. This means: Less application code to write and maintain for masking Consistent masking everywhere (whether data is accessed via app, report, or ad-hoc query). Quick changes to masking rules in one place if requirements change, without hunting through application code. From a security standpoint, DDM reduces the risk of accidental data exposure and helps in compliance scenarios where personal data must be protected in lower environments or by certain roles, while reducing the developer effort drastically. In the next posts of this series, we’ll explore other masking functions (like Email, Partial, and Random etc) with different scenarios. By the end, you’ll see how each built-in mask can be applied to make data security and compliance more developer-friendly! Reference Links : Dynamic Data Masking - SQL Server | Microsoft Learn Dynamic Data Masking - Azure SQL Database & Azure SQL Managed Instance & Azure Synapse Analytics | Microsoft Learn420Views1like0Comments2025 Year in Review: What’s new across SQL Server, Azure SQL and SQL database in Fabric
What a year 2025 has been for SQL! ICYMI and are looking for some hype, might I recommend you start with this blog from Priya Sathy, the product leader for all of SQL at Microsoft: One consistent SQL: The launchpad from legacy to innovation. In this blog post, Priya explains how we have developed and continue to develop one consistent SQL which “unifies your data estate, bringing platform consistency, performance at scale, advanced security, and AI-ready tools together in one seamless experience and creates one home for your SQL workloads in the era of AI.” For the FIFTH(!!) year in a row (my heart is warm with the number, I love SQL and #SQLfamily, and time is flying), I am sharing my annual Year in Review blog with all the SQL Server, Azure SQL and SQL database in Fabric news this year. Of course, you can catch weekly episodes related to what’s new and diving deeper on the Azure SQL YouTube channel at aka.ms/AzureSQLYT. This year, in addition to Data Exposed (52 new episodes and over 70K views!). We saw many new series related to areas like GitHub Copilot, SSMS, VS Code, and Azure SQL Managed Instance land in the channel, in addition to Data Exposed. Microsoft Ignite announcements Of course, if you’re looking for the latest announcements from Microsoft Ignite, Bob Ward and I compiled this slide of highlights. Comprehensive list of 2025 updates You can read this blog (or use AI to reference it later) to get all the updates and references from the year (so much happened at Ignite but before it too!). Here’s all the updates from the year: SQL Server, Arc-enabled SQL Server, and SQL Server on Azure VMs Generally Available SQL Server 2025 is Now Generally Available Backup/Restore capabilities in SQL Server 2025 SQL Server 2025: Deeply Integrated and Feature-rich on Linux Resource Governor for Standard Edition Reimagining Data Excellence: SQL Server 2025 Accelerated by Pure Storage Security Update for SQL Server 2022 RTM CU21 Cumulative Update #22 for SQL Server 2022 RTM Backup/Restore enhancements in SQL Server 2025 Unified configuration and governance Expanding Azure Arc for Hybrid and Multicloud Management US Government Virginia region support I/O Analysis for SQL Server on Azure VMs NVIDIA Nemotron RAG Integration Preview Azure Arc resource discovery in Azure Migrate Multicloud connector support for Google Cloud Migrations Generally Available SQL Server migration in Azure Arc Azure Database Migration Service Hub Experience SQL Server Migration Assistant (SSMA) v10.3, including Db2 SKU recommendation (preview) Database Migration Service: PowerShell, Azure CLI, and Python SDK SQL Server Migration Assistant (SSMA) v10.4, including SQL Server 2025 support, Oracle conversion Copilot Schema migration support in Azure Database Migration Service Preview Azure Arc resource discovery in Azure Migrate Azure SQL Managed Instance Generally Available Next-gen General Purpose Service Tier Improved connectivity types in Azure SQL Managed Instance Improved resiliency with zone redundancy for general purpose, improved log rate for business critical Apply reservation discount for zone redundant Business Critical databases Free offer Windows principals use to simplify migrations Data exfiltration improvements Preview Windows Authentication for Cloud-Native Identities New update policy for Azure SQL Managed Instance Azure SQL Database Generally Available LTR Backup Immutability Free Azure SQL Database Offer updates Move to Hyperscale while preserving existing geo-replication or failover group settings Improve redirect connection type to require only port 1433 and promote to default Bigint support in DATEADD for extended range calculations Restart your database from the Azure portal Replication lag metric Enhanced server audit and server audit action groups Read-access geo-zone redundant storage (RA-GZRS) as a backup storage type for non-Hyperscale Improved cutover experience to Hyperscale SLA-compliant availability metric Use database shrink to reduced allocated space for Hyperscale databases Identify causes of auto-resuming serverless workloads Preview Multiple geo-replicas for Azure SQL Hyperscale Backup immutability for Azure SQL Database LTR backups Updates across SQL Server, Azure SQL and Fabric SQL database Generally Available Regex Support and fuzzy-string matching Geo-replication and Transparent Data Encryption key management Optimized locking v2 Azure SQL hub in the Azure portal UNISTR intrinsic function and ANSI SQL concatenation operator (||) New vector data type JSON index JSON data type and aggregates Preview Stream data to Azure Event Hubs with Change Event Streaming (Azure SQL DB Public Preview/Fabric SQL Private Preview) DiskANN vector indexing SQL database in Microsoft Fabric and Mirroring Generally Available Fabric Databases SQL database in Fabric Unlocking Enterprise ready SQL database in Microsoft Fabric: ALM improvements, Backup customizations and retention, Copilot enhancements & more update details Mirroring for SQL Server Mirroring for Azure SQL Managed Instance in Microsoft Fabric Connect to your SQL database in Fabric using Python Notebook Updates to database development tools for SQL database in Fabric Using Fast Copy for data ingestion Copilot for SQL analytics endpoint Any updates across Microsoft Fabric that apply to the SQL analytics endpoint are generally supported in mirrored databases and Fabric SQL databases via the SQL analytics endpoint. This includes many exciting areas, like Data Agents. See the Fabric blog to get inspired Preview Data virtualization support Workspace level Private Link support (Private Preview) Customer-managed keys in Fabric SQL Database Auditing for Fabric SQL Database Fabric CLI: Create a SQL database in Fabric SQL database workload in Fabric with Terraform Spark Connector for SQL databases Tools and developer Blog to Read: How the Microsoft SQL team is investing in SQL tools and experiences SQL Server Management Studio (SSMS) 22.1 GitHub Copilot Walkthrough (Preview): Guided onboarding from the Copilot badge. Copilot right-click actions (Preview): Document, Explain, Fix, and Optimize. Bring your own model (BYOM) support in Copilot (Preview). Copilot performance: improved response time after the first prompt in a thread. Fixes: addressed Copilot “Run ValidateGeneratedTSQL” loop and other stability issues. SQL Server Management Studio (SSMS) 22 Support for SQL Server 2025 Modern connection dialog as default + Fabric browsing on the Browse tab. Windows Arm64 support (initial) for core scenarios (connect + query). GitHub Copilot in SSMS (Preview) is available via the AI Assistance workload in the VS Installer. T-SQL/UX improvements: open execution plan in new tab, JSON viewer, results grid zooms. New index support: create JSON and Vector indexes from Object Explorer SQL Server Management Studio (SSMS) 21 Installation and automatic updates via Visual Studio Installer. Workloads/components model: smaller footprint + customizable install. Git integration is available via the Code tools workload. Modern connection dialog experience (Preview). New customization options (e.g., vertical tabs, tab coloring, results in grid NULL styling). Always Encrypted Assessment in the Always Encrypted Wizard. Migration assistance via the Hybrid and Migration workload. mssql-python Driver ODBC: Microsoft ODBC Driver 18.5.2.1 for SQL Server OLE DB: Microsoft OLE DB Driver 19.4.1 for SQL Server JDBC (latest train): Microsoft JDBC Driver for SQL Server 13.2.1 Also updated in 2025: supported JDBC branches received multiple servicing updates (including Oct 13, 2025, security fixes). See the same JDBC release notes for the full list. .NET: Microsoft.Data.SqlClient 6.0.2 Related - some notes on drivers released/updated in 2025 (recap): MSSQL extension for VS Code 1.37.0 GitHub Copilot integration : Ask/Agent modes, slash commands, onboarding. Edit Data : interactive grid for editing table data (requires mssql.enableExperimentalFeatures: true). Data-tier Application dialog : deploy/extract .dacpac and import/export .bacpac (requires mssql.enableExperimentalFeatures: true). Publish SQL Project dialog : deploy .sqlproj to an existing DB or a local SQL dev container. Added “What’s New” panel + improved query results grid stability/accessibility. MSSQL extension for VS Code 1.36.0 Fabric connectivity : browse Fabric workspaces and connect to SQL DBs / SQL analytics endpoints. SQL database in Fabric provisioning : create Fabric SQL databases from Deployments. GitHub Copilot slash commands : connection, schema exploration, query tasks. Schema Compare extensibility: new run command for external extensions/SQL Projects (incl. Update Project from Database support). Query results in performance/reliability improvements (incremental streaming, fewer freezes, better settings handling). SqlPackage 170.0.94 release notes (April 2025) Vector: support for vector data type in Azure SQL Database target platform (import/export/extract/deploy/build). SQL projects: default compatibility level for Azure SQL Database and SQL database in Fabric set to 170. Parquet: expanded supported types (including json, xml, and vector) + bcp fallback for unsupported types. Extract: unpack a .dacpac to a folder via /Action:Extract. Platform: Remove .NET 6 support; .NET Framework build updated to 4.7.2. SqlPackage 170.1.61 release notes (July 2025) Data virtualization (Azure SQL DB): added support for data virtualization objects in import/export/extract/publish. Deployment: new publishing properties /p:IgnorePreDeployScript and /p:IgnorePostDeployScript. Permissions: support for ALTER ANY EXTERNAL MIRROR (Azure SQL DB + SQL database in Fabric) for exporting mirrored tables. SQL Server 2025 permissions: support for CREATE ANY EXTERNAL MODEL, ALTER ANY EXTERNAL MODEL, and ALTER ANY INFORMATION PROTECTION. Fixes: improved Fabric compatibility (e.g., avoid deploying unsupported server objects; fixes for Fabric extraction scripting). SqlPackage 170.2.70 release notes (October 2025) External models: support for external models in Azure SQL Database and SQL Server 2025. AI functions: support for AI_GENERATE_CHUNKS and AI_GENERATE_EMBEDDINGS. JSON: support for JSON indexes + functions JSON_ARRAYAGG, JSON_OBJECTAGG, JSON_QUERY. Vector: vector indexes + VECTOR_SEARCH and expanded vector support for SQL Server 2025. Regex: support for REGEXP_LIKE. Microsoft.Build.Sql 1.0.0 (SQL database projects SDK) Breaking: .NET 8 SDK required for dotnet build (Visual Studio build unchanged). Globalization support. Improved SDK/Templates docs (more detailed README + release notes links). Code analyzer template defaults DevelopmentDependency. Build validation: check for duplicate build items. Microsoft.Build.Sql 2.0.0 (SQL database projects SDK) Added SQL Server 2025 target platform (Sql170DatabaseSchemaProvider). Updated DacFx version to 170.2.70. .NET SDK targets imported by default (includes newer .NET build features/fixes; avoids full rebuilds with no changes Azure Data Studio retirement announcement (retirement February 28, 2026) Anna’s Pick of the Month Year It’s hard to pick a highlight representative of the whole year, so I’ll take the cheesy way out: people. I get to work with great people working on a great set of products for great people (like you) solving real world problems for people. So, thank YOU and you’re my pick of the year 🧀 Until next time… That’s it for now! We release new episodes on Thursdays and new #MVPTuesday episodes on the last Tuesday of every month at aka.ms/azuresqlyt. The team has been producing a lot more video content outside of Data Exposed, which you can find at that link too! Having trouble keeping up? Be sure to follow us on twitter to get the latest updates on everything, @AzureSQL. And if you lose this blog, just remember aka.ms/newsupdate2025 We hope to see you next YEAR, on Data Exposed! --Anna and Marisa1.5KViews1like1CommentWhats new in the Backup/Restore area in SQL Server 2025
Over the past several months, we’ve heard from countless customers who are eager for more robust options to protect, compress, and safeguard their SQL Server data. Since introducing these features in public preview, organizations of all sizes have validated their value in real-world workloads and provided invaluable feedback. Following are three “hidden gems” (as one of our customers called them), in the Backup/Restore area we announced with SQL Server 2025. Backups on Secondary for Always On Availability Groups, ZSTD Compression, and Immutable Backups for Ransomware Protection. These advancements are now ready for production use, built around the needs and requests of the SQL Server customers. 1. Backups on Secondary for SQL Server Always On Availability Groups Previously in preview, this feature now reaches GA, allowing you to offload backup operations to secondary replicas in Always On Availability Groups. This enhancement optimizes resource utilization and minimizes overhead on primary replicas, ensuring better performance for mission-critical workloads. What’s New in GA: Improved reliability and support for production environments. Key Benefits: Comprehensive support: Full, differential, and transaction log backups are now fully supported on secondary replicas—not just COPY_ONLY and transaction logs. Reduced impact on primary replica performance. Simplified high-availability strategies. Learn more from the original announcement: Introducing Backups on Secondary for SQL Server Always On Availability Groups. 2. ZSTD Compression in SQL Server 2025 The GA of ZSTD compression brings modern, efficient data compression to SQL Server. ZSTD compression introduces industry-leading performance and efficiency, letting you save storage and speed up workloads. What’s New in GA: Full production support for ZSTD across key workloads. Key Benefits: Faster compression and decompression compared to legacy algorithms. Lower storage footprint without sacrificing performance. Choose your algorithm: ZSTD is now a standard option right alongside MS_XPRESS for row, page, and backup compression. Tunable compression levels: Administrators can select from LOW, MEDIUM, or HIGH to balance resource use and savings. Ideal for large-scale data environments. Explore the preview details: ZSTD Compression in SQL Server 2025. 3. Backups to immutable storage: A Powerful Shield Against Ransomware Your backups are now safer than ever. Thanks to native support for immutability with Azure Blob Storage, backup files can be rendered tamper-proof—protecting them from ransomware or even accidental deletion. Key Benefits: Strong defense against ransomware and malicious tampering. Compliance with regulatory requirements for data integrity. Peace of mind for critical backup strategies. Read the detailed use-case and how-to: Immutability: A Powerful Shield Against Ransomware in SQL Environments. These features collectively empower organizations to: Optimize performance and resource utilization. Reduce operational costs through efficient compression. Strengthen security posture against evolving threats. Get Started Today These features are available to all SQL Server 2025 customers. Ready to elevate your data protection, efficiency, and compliance posture? Access the official SQL Server 2025 documentation for step-by-step guides via visit Microsoft Learn. Review upgrade guidance and best practices. Explore real-world configurations and FAQs. Upgrade now and unlock the next level of resilience, efficiency, and security for your SQL Server workloads!1.4KViews1like0CommentsGeneral Availability announcement of Backup/Restore capabilities in SQL Server 2025
Over the past several months, we’ve heard from countless customers who are eager for more robust options to protect, compress, and safeguard their SQL Server data. Since introducing these features in public preview, organizations of all sizes have validated their value in real-world workloads and provided invaluable feedback. Today, we are thrilled to announce the General Availability (GA) of three powerful SQL Server 2025 features: Backups on Secondary for Always On Availability Groups, ZSTD Compression, and Immutable Backups for Ransomware Protection. These advancements are now ready for production use, built around the needs and requests of the SQL Server customers. Backups on Secondary for SQL Server Always On Availability Groups Previously in preview, this feature now reaches GA, allowing you to offload backup operations to secondary replicas in Always On Availability Groups. This enhancement optimizes resource utilization and minimizes overhead on primary replicas, ensuring better performance for mission-critical workloads. What’s New in GA: Improved reliability and support for production environments. Key Benefits: Comprehensive support: Full, differential, and transaction log backups are now fully supported on secondary replicas—not just COPY_ONLY and transaction logs. Reduced impact on primary replica performance. Simplified high-availability strategies. Learn more from the original announcement: Introducing Backups on Secondary for SQL Server Always On Availability Groups. ZSTD Compression in SQL Server 2025 The GA of ZSTD compression brings modern, efficient data compression to SQL Server. ZSTD compression introduces industry-leading performance and efficiency, letting you save storage and speed up workloads. What’s New in GA: Full production support for ZSTD across key workloads. Key Benefits: Faster compression and decompression compared to legacy algorithms. Lower storage footprint without sacrificing performance. Choose your algorithm: ZSTD is now a standard option right alongside MS_XPRESS for row, page, and backup compression. Tunable compression levels: Administrators can select from LOW, MEDIUM, or HIGH to balance resource use and savings. Ideal for large-scale data environments. Explore the preview details: ZSTD Compression in SQL Server 2025. Backups to immutable storage: A Powerful Shield Against Ransomware Your backups are now safer than ever. Thanks to native support for immutability with Azure Blob Storage, backup files can be rendered tamper-proof—protecting them from ransomware or even accidental deletion. Key Benefits: Strong defense against ransomware and malicious tampering. Compliance with regulatory requirements for data integrity. Peace of mind for critical backup strategies. Read the detailed use-case and how-to: Immutability: A Powerful Shield Against Ransomware in SQL Environments. These features collectively empower organizations to: Optimize performance and resource utilization. Reduce operational costs through efficient compression. Strengthen security posture against evolving threats. Get Started Today These features are available to all SQL Server 2025 customers. Ready to elevate your data protection, efficiency, and compliance posture? Access the official SQL Server 2025 documentation for step-by-step guides via visit Microsoft Learn. Review upgrade guidance and best practices. Explore real-world configurations and FAQs. Upgrade now and unlock the next level of resilience, efficiency, and security for your SQL Server workloads!1.2KViews1like0CommentsStream data in near real time from SQL to Azure Event Hubs - Public preview
If near-real time integration is something you are looking to implement and you were looking for a simpler way to get the data out of SQL, keep reading. SQL is making it easier to integrate and Change Event Streaming is a feature continuing this trend. Modern applications and analytics platforms increasingly rely on event-driven architectures and real-time data pipelines. As the businesses speed up, real time decisioning is becoming especially important. Traditionally, capturing changes from a relational database requires complex ETL jobs, periodic polling, or third-party tools. These approaches often consume significant cycles of the data source, introduce operational overhead, and pose challenges with scalability, especially if you need one data source to feed into multiple destinations. In this context, we are happy to release Change Event Streaming ("CES") feature into Public Preview for Azure SQL Database. This feature enables you to stream row-level changes - inserts, updates, and deletes - from your database directly to Azure Event Hubs in near real time. Change Event Streaming addresses the above challenges by: Reducing latency: Changes are streamed (pushed by SQL) as they happen. This is in contrast with traditional CDC (change data capture) or CT (change tracking) based approaches, where an external component needs to poll SQL at regular intervals. Traditional approaches allow you to increase polling frequency, but it gets difficult to find a sweet spot between minimal latency and minimal overhead due to too frequent polls. Simplifying architecture: No need for Change Data Capture (CDC), Change Tracking, custom polling or external connectors - SQL streams directly to configured destination. This means simpler security profile (fewer authentication points), fewer failure points, easier monitoring, lower skill bar to deploy and run the service. No need to worry about cleanup jobs, etc. SQL keeps track of which changes are successfully received by the destination, handles the retry logic and releases log truncation point. Finally, with CES you have fewer components to procure and get approved for production use. Decoupling: The integration is done on the database level. This eliminates the problem of dual writes - the changes are streamed at transaction boundaries, once your source of truth (the database) has saved the changes. You do not need to modify your app workloads to get the data streamed - you tap right onto the data layer - this is useful if your apps are dated and do not possess real-time integration capabilities. In case of some 3rd party apps, you may not even have an option to do anything other than database level integration, and CES makes it simpler. Also, the publishing database does not concern itself with the final destination for the data - Stream the data once to the common message bus, and you can consume it by multiple downstream systems, irrespective of their number or capacity - the (number of) consumers does not affect publishing load on the SQL side. Serving consumers is handled by the message bus, Azure Event Hubs, which is purpose built for high throughput data transfers. onceptually visualizing data flow from SQL Server, with an arrow towards Azure Event Hubs, from where a number of arrows point to different final destinations. Key Scenarios for CES Event-driven microservices: They need to exchange data, typically thru a common message bus. With CES, you can have automated data publishing from each of the microservices. This allows you to trigger business processes immediately when data changes. Real-time analytics: Stream operational data into platforms like Fabric Real Time Intelligence or Azure Stream Analytics for quick insights. Breaking down the monoliths: Typical monolithic systems with complex schemas, sitting on top of a single database can be broken down one piece at a time: create a new component (typically a microservice), set up the streaming from the relevant tables on the monolith database and tap into the stream by the new components. You can then test run the components, validate the results against the original monolith, and cutover when you build the confidence that the new component is stable. Cache and search index updates: Keep distributed caches and search indexes in sync without custom triggers. Data lake ingestion: Capture changes continuously into storage for incremental processing. Data availability: This is not a scenario per se, but the amount of data you can tap into for business process mining or intelligence in general goes up whenever you plug another database into the message bus. E.g. You plug in your eCommerce system to the message bus to integrate with Shipping providers, and consequently, the same data stream is immediately available for any other systems to tap into. How It Works CES uses transaction log-based capture to stream changes with minimal impact on your workload. Events are published in a structured JSON format following the CloudEvents standard, including operation type, primary key, and before/after values. You can configure CES to target Azure Event Hubs via AMQP or Kafka protocols. For details on configuration, message format, and FAQs, see the official documentation: Feature Overview CES: Frequently Asked Questions Get Started Public preview CES is available today in public preview for Azure SQL Database and as a preview feature in SQL Server 2025. [update 20-mar-2026] Change Event Streaming is now in public preview for Azure SQL Managed instance. Read more here. Private preview CES is also available as a private preview for Azure SQL Managed Instance and Fabric SQL database: you can request to join the private preview by signing up here: https://aka.ms/sql-ces-signup We encourage you to try the feature out and start building real-time integrations on top of your existing data. We welcome your feedback—please share your experience through Azure Feedback portal or support channels. The comments below on this blog post will also be monitored, if you want to engage with us. Finally, CES team can be reached via email: sqlcesfeedback [at] microsoft [dot] com. Useful resources Free Azure SQL Database. Free Azure SQL Managed Instance.1.4KViews0likes0CommentsGeneral Availability Announcement: Regex Support in SQL Server 2025 & Azure SQL
We’re excited to announce the General Availability (GA) of native Regex support in SQL Server 2025 and Azure SQL — a long-awaited capability that brings powerful pattern matching directly into T-SQL. This release marks a significant milestone in modernizing string operations and enabling advanced text processing scenarios natively within the database engine. What is Regex? The other day, while building LEGO with my 3-year-old — an activity that’s equal parts joy and chaos — I spent minutes digging for one tiny piece and thought, “If only Regex worked on LEGO.” That moment of playful frustration turned into a perfect metaphor. Think of your LEGO box as a pile of data — a colorful jumble of tiny pieces. Now imagine trying to find every little brick from a specific LEGO set your kid mixed into the pile. That’s tricky — you’d have to sift through each piece one by one. But what if you had a smart filter that instantly found exactly those pieces? That’s what Regex (short for Regular Expressions) does for your data. It’s a powerful pattern-matching tool that helps you search, extract, and transform text with precision. With Regex now natively supported in SQL Server 2025 and Azure SQL, this capability is built directly into T-SQL — no external languages or workarounds required. What can Regex help you do? Regex can help you tackle a wide range of data challenges, including: Enhancing data quality and accuracy by validating and correcting formats like phone numbers, email addresses, zip codes, and more. Extracting valuable insights by identifying and grouping specific text patterns such as keywords, hashtags, or mentions. Transforming and standardizing data by replacing, splitting, or joining text patterns — useful for handling abbreviations, acronyms, or synonyms. Cleaning and optimizing data by removing unwanted patterns like extra whitespace, punctuation, or duplicates. Meet the new Regex functions in T-SQL SQL Server 2025 introduces seven new T-SQL Regex functions, grouped into two categories: scalar functions (return a value per row) and table-valued functions (TVFs) (return a set of rows). Here’s a quick overview: Function Type Description REGEXP_LIKE Scalar Returns TRUE if the input string matches the Regex pattern REGEXP_COUNT Scalar Counts the number of times a pattern occurs in a string REGEXP_INSTR Scalar Returns the position of a pattern match within a string REGEXP_REPLACE Scalar Replaces substrings that match a pattern with a replacement string REGEXP_SUBSTR Scalar Extracts a substring that matches a pattern REGEXP_MATCHES TVF Returns a table of all matches including substrings and their positions REGEXP_SPLIT_TO_TABLE TVF Splits a string into rows using a Regex delimiter These functions follow the POSIX standard and support most of the PCRE/PCRE2 flavor of regular expression syntax, making them compatible with most modern Regex engines and tools. They support common features like: Character classes (\d, \w, etc.) Quantifiers (+, *, {n}) Alternation (|) Capture groups ((...)) You can also use Regex flags to modify behavior: 'i' – Case-insensitive matching 'm' – Multi-line mode (^ and $ match line boundaries) 's' – Dot matches newline 'c' – Case-sensitive matching (default) Examples: Regex in Action Let’s explore how these functions solve tricky real-world data tasks that were hard to do in earlier SQL versions. REGEXP_LIKE: Data Validation — Keeping data in shape Validating formats like email addresses or phone numbers used to require multiple functions or external tools. With REGEXP_LIKE, it’s now a concise query. For example, you can check whether an email contains valid characters before and after the @, followed by a domain with at least two letters like .com, .org, or .co.in. SELECT [Name], Email, CASE WHEN REGEXP_LIKE (Email, '^[A-Za-z0-9._+]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$') THEN 'Valid Email' ELSE 'Invalid Email' END AS IsValidEmail FROM (VALUES ('John Doe', 'john@contoso.com'), ('Alice Smith', 'alice@fabrikam.com'), ('Bob Johnson', 'bob@fabrikam.net'), ('Charlie Brown', 'charlie@contoso.co.in'), ('Eve Jones', 'eve@@contoso.com')) AS e(Name, Email); We can further use REGEXP_LIKE in CHECK constraints to enforce these rules at the column level (so no invalid format ever gets into the table). For instance: CREATE TABLE Employees ( ..., Email VARCHAR (320) CHECK (REGEXP_LIKE (Email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')), Phone VARCHAR (20) CHECK (REGEXP_LIKE (Phone, '^(\d{3})-(\d{3})-(\d{4})$')) ); This level of enforcement significantly enhances data integrity by ensuring that only correctly formatted values are accepted into the database. REGEXP_COUNT: Count JSON object keys Count how many top-level keys exist in a JSON string — no JSON parser needed! SELECT JsonData, REGEXP_COUNT(JsonData, '"[^"]+"\s*:', 1, 'i') AS NumKeys FROM (VALUES ('{"name":"Abhiman","role":"PM","location":"Bengaluru"}'), ('{"skills":["SQL","T-SQL","Regex"],"level":"Advanced"}'), ('{"project":{"name":"Regex GA","status":"Live"},"team":["Tejas","UC"]}'), ('{"empty":{}}'), ('{}')) AS t(JsonData); REGEXP_INSTR: Locate patterns in logs Find the position of the first error code (ERR-XXXX) in log messages — even when the pattern appears multiple times or in varying locations. SELECT LogMessage, REGEXP_INSTR(LogMessage, 'ERR-\d{4}', 1, 1, 0, 'i') AS ErrorCodePosition FROM (VALUES ('System initialized. ERR-1001 occurred during startup.'), ('Warning: Disk space low. ERR-2048. Retry failed. ERR-2049.'), ('No errors found.'), ('ERR-0001: Critical failure. ERR-0002: Recovery started.'), ('Startup complete. Monitoring active.')) AS t(LogMessage); REGEXP_REPLACE: Redact sensitive data Mask SSNs and credit card numbers in logs or exports — all with a single, secure query. SELECT sensitive_info, REGEXP_REPLACE(sensitive_info, '(\d{3}-\d{2}-\d{4}|\d{4}-\d{4}-\d{4}-\d{4})', '***-**-****') AS redacted_info FROM (VALUES ('John Doe SSN: 123-45-6789'), ('Credit Card: 9876-5432-1098-7654'), ('SSN: 000-00-0000 and Card: 1111-2222-3333-4444'), ('No sensitive info here'), ('Multiple SSNs: 111-22-3333, 222-33-4444'), ('Card: 1234-5678-9012-3456, SSN: 999-88-7777')) AS t(sensitive_info); REGEXP_SUBSTR: Extract and count email domains Extract domains from email addresses and group users by domain. SELECT REGEXP_SUBSTR(Email, '@(.+)$', 1, 1, 'i', 1) AS Domain, COUNT(*) AS NumUsers FROM (VALUES ('Alice', 'alice@contoso.com'), ('Bob', 'bob@fabrikam.co.in'), ('Charlie', 'charlie@example.com'), ('Diana', 'diana@college.edu'), ('Eve', 'eve@contoso.com'), ('Frank', 'frank@fabrikam.co.in'), ('Grace', 'grace@example.net')) AS e(Name, Email) WHERE REGEXP_LIKE (Email, '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$') GROUP BY REGEXP_SUBSTR(Email, '@(.+)$', 1, 1, 'i', 1); REGEXP_MATCHES: Extract multiple emails from text Extract all email addresses from free-form text like comments or logs — returning each match as a separate row for easy parsing or analysis. SELECT * FROM REGEXP_MATCHES ('Contact us at support@example.com or sales@example.com', '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}'); This query identifies and returns both email addresses found in the string — no need for loops, manual parsing, or external scripting. REGEXP_SPLIT_TO_TABLE: Break down structured text Split a string into rows using a Regex delimiter — ideal for parsing logs, config entries, or form data. SELECT * FROM REGEXP_SPLIT_TO_TABLE ('Name: John Doe; Email: john.doe@example.com; Phone: 123-456-7890', '; '); This query breaks the input string into rows for each field, making it easier to parse and process the data — especially when dealing with inconsistent or custom delimiters. To explore more examples, syntax options, and usage details, head over to the https://learn.microsoft.com/en-us/sql/t-sql/functions/regular-expressions-functions-transact-sql?view=sql-server-ver17. Conclusion The addition of Regex functionality in SQL Server 2025 and Azure SQL is a major leap forward for developers and DBAs. It eliminates the need for external libraries, CLR integration, or complex workarounds for text processing. With Regex now built into T-SQL, you can: Validate and enforce data formats Sanitize and transform sensitive data Search logs for complex patterns Extract and split structured content And this is just the beginning. Regex opens the door to a whole new level of data quality, text analytics, and developer productivity — all within the database engine. So go ahead and Regex away! Your feedback and partnership continue to drive innovation in Azure SQL and SQL Server — thank you for being part of it.316Views0likes0Comments