SQL Server Regular Expressions Library Sample
Published Feb 03 2022 11:00 AM 3,018 Views

Introduction

 

When migrating from one database platform to another, often you run into roadblocks because one supports a feature not available on the other. Such is the case with Regular Expressions (regex). Today, there is no native regex function support in SQL Server, although it is on the roadmap. Fortunately, to unblock migrations to SQL Server, it supports writing functions for the Common Language Runtime (CLR) in a .NET language (i.e. C#).

 

There are several websites that have examples of regex support using SQL CLR, and this is another, but in this blog we take it the extra step to add regex support to one of our managed SQL Server offerings in Azure - Azure SQL Managed Instance (MI). Since there is no CLR support in Azure SQL DB, it is not possible to use CLR assemblies on that platform. The sample code in this blog is also open source, so you can add additional functions or modify it as you please.

 

 

Regular Expressions

 

Regular Expressions provides a mechanism to find or replace parts of a string using a sequence of characters referred to as a pattern. A full discussion of regex and its use is outside the scope of this article, but fortunately there are many references available on the internet.

 

Although writing a regex library might be fun, there is no sense reinventing the wheel, so we cheat and use the library included in the System.Text.RegularExpressions C# class library. We then need to decide how to expose the various classes available (i.e. Match, Capture, RegexOptions, etc.) and provide the plumbing required for SQL Server to call the appropriate class methods.

 

Many of the regex functions allow you to specify RegexOptions to control the operation of the methods. The most useful option is whether the search is case sensitive or not (by default it is case sensitive). The RegexOptions is an enumeration that provides a bit mask of options, which would be a bit of a challenge to do in Transact SQL (TSQL), so we provide a method that allows the caller to set the appropriate bits in a returned integer mask, which can then be provided as a parameter to versions of each CLR function that allow RegexOptions to be specified. There are therefore two versions of each function, one that uses the default options and another that allows the user to pass the RegexOptions mask (methods ending in _opt).

 

To extend the usefulness of the regex functions somewhat, for one set of methods, we added the ability to insert the strings matches into an output string using position markers. Again, instead of creating something new, we leverage the Format method of the standard String C# class to expose this functionality. The found/matches strings are inserted in the output string where {0}, {1}, {2} etc. markers are present with the option of doing additional formatting that the Format method allows.

For our purposes, SQL CLR functions can return a value or a table. In some methods it made sense to return a single string or a Boolean status, in other cases, a table was more useful.  The following table describes the regex functions that are part of the sample code.

 

RegEx Library Functions in the sample

 

Function Name

Parameters

Output

Description

RegEx_IsMatch

@input nvarchar(max)

@pattern nvarchar(max)

Bit (Boolean)

Returns true if the pattern is found in the input string.

RegEx_IsMatch_Opt

@input nvarchar(max)

@pattern nvarchar(max)

@regexopt int

Bit (Boolean)

Returns true if the pattern is found in the input string.

RegEx_Match_Count

@input nvarchar(max)

@pattern nvarchar(max)

int

Returns the count of the matches found.

RegEx_Match_Count_Opt

@input nvarchar(max)

@pattern nvarchar(max)

@regexopt int

int

Returns the count of the matches found.

RegEx_Match

@input nvarchar(max)

@pattern nvarchar(max)

nvarchar(max)

Returns the first string found.

RegEx_Match_Opt

@input nvarchar(max)

@pattern nvarchar(max)

@regexopt int

nvarchar(max)

Returns the first string found.

RegEx_Replace

@input nvarchar(max)

@pattern nvarchar(max)

@replacement nvarchar(max)

nvarchar(max)

Replaces matching strings with the replacement string and returns the resulting string.

RegEx_Replace_Opt

@input nvarchar(max)

@pattern nvarchar(max)

@replacement nvarchar(max)

@regexopt int

nvarchar(max)

Replaces matching strings with the replacement string and returns the resulting string.

RegEx_Format

@input nvarchar(max)

@pattern nvarchar(max)

@output nvarchar(max)

nvarchar(max)

Inserts the matching strings into the output string using provided parameter markers ({0}, {1} etc.) and returns the resulting string.

RegEx_Format_Opt

@input nvarchar(max)

@pattern nvarchar(max)

@output nvarchar(max)

@regexopt int

nvarchar(max)

Inserts the matching strings into the output string using provided parameter markers ({0}, {1} etc.) and returns the resulting string.

RegEx_Options

@IgnoreCase bit

@MultiLine bit

@ExplicitCapture bit

@compiled bit

@SingleLine bit

@IgnorePatternWhitespace bit

@RightToLeft bit

@ECMAScript bit

@CultureInvariant bit

int

Allows creation of a regexopt bitmask to pass to _Opt version methods

RegEx_Match_All

@input nvarchar(max)

@pattern nvarchar(max)

Table

index int

length int

match nvarchar(max)

Returns a table of matches with the index of the match, the length of the matched string and the actual string value of the match.

RegEx_Match_All_Opt

@input nvarchar(max)

@pattern nvarchar(max)

@regexopt int

Table

index int

length int

match nvarchar(max)

Returns a table of matches with the index of the match, the length of the matched string and the actual string value of the match.

 

 

Creating a CLR Function

 

Attributes are used to flag methods as callable by SQL Server.  Below shows the definition of the ReEx_Match function – the first line is the required attribute and the rest of the code is the function implementation;

 

[SqlFunction(Name = "RegEx_Match", DataAccess = DataAccessKind.None, IsDeterministic = false)]

public static SqlString RegEx_Match(SqlString input, SqlString pattern)

{

  if (input.IsNull || pattern.IsNull || input.ToString().Length == 0 || pattern.ToString().Length == 0)

return SqlString.Null;

  return Regex.Match(input.ToString(), pattern.ToString()).Value;

}

 

 

Table value functions are slightly different. You need to provide the FillRowMethodName as an attribute to the main function that is called with the parameters you need for the implementation. The return value from the function is an enumerable type (implements IEnumerable) which is automatically provided by most C# collection classes. The code sample below shows the implementation of the RegEx_Match_All function that returns a table;

 

[SqlFunction(FillRowMethodName = "FillRow")]

public static IEnumerable RegEx_Match_All(SqlString input, SqlString pattern)

{

  if (input.IsNull || pattern.IsNull || input.ToString().Length == 0 || pattern.ToString().Length == 0)

return null;

  return Regex.Matches(input.ToString(), pattern.ToString());

}

public static void FillRow(Object obj, out SqlInt32 index, out SqlInt32 length, out SqlChars match)

{

  Match m = (Match)obj;

  index = m.Index;

  length = m.Length;

  match = new SqlChars(m.Value);

}

Registering the CLR Functions

 

To register the CLR functions in SQL Server, the first thing you need to do is to register the .NET assembly. As of SQL Server 2017, a new security option was added: CLR strict security, that makes the process of registering an assembly more secure, but also more complicated. The assembly needs to be signed with a certificate (or asymmetric key) and a login needs to be created using the certificate/key that has the UNSAFE_ASSEMBLY permission in the master database.

 

In our case, we create a certificate that we use to sign the assembly and use the same certificate to create a login from the certificate. Below is the partial command – the full command is in the sample download.

 

CREATE CERTIFICATE [SQLCLR-Cert]

FROM BINARY = 0x3082036A30820256A00302010202106DFC260EB312E79845D7F0B3CC0E5088300906052B0E03021D\

…

5F7CBBE386DE0F7E610A305777FB35BD1B6C4F55CCD11A5E9F04C5558DA24A62BEDBE98A7EE3;

GO

CREATE LOGIN [SQLCLR-Login]

FROM CERTIFICATE [SQLCLR-Cert];

GO

GRANT UNSAFE ASSEMBLY TO [SQLCLR-Login];

GO

The commands to create the certificate are included in the sample and we recommend you generate your own certificate instead of using the sample one.

Once the login is created and the DLL is signed, you can then register it with SQL Server using;

 

CREATE ASSEMBLY RegEx FROM 'C:\SQLRegex\SQLRegex.dll' WITH PERMISSION_SET = UNSAFE;

 

If this is successful, you can then register the individual functions – i.e.;

 

CREATE FUNCTION RegEx_Match(@input nvarchar(max), @pattern nvarchar(max))

RETURNS nvarchar(max)

AS 

EXTERNAL NAME RegEx.RegExUDF.RegEx_Match; 

The full script to register the assembly and all of the functions is in the UseAssembly.sql file.

 

Before you can call the function, CLR needs to be enabled in SQL Server (by default it is not). Before enabling this feature, you need to read the documentation and understand the security implications. To enable CLR do the following;

 

exec sp_configure 'clr enabled', 1

reconfigure

 

At this point you should be able to use the functions.

 

 

Using RegEx on Azure SQL Managed Instance

 

Since you don’t have access to local storage on a Managed Instance, the only way to load a CLR assembly is from a binary string. The easiest way of generating this string is by loading the assembly in a  SQL Server and scripting the assembly.

 

MitchvanHuuksloot_0-1643754800946.png

 

You should end up with a file that looks like;

 

MitchvanHuuksloot_1-1643754800954.png

 

You can now run this on your MI server to load the assembly (after creating the certificate, login from the certificate, and granting permissions) and you then need to register the RegEx functions.

 

There is a UseAssembly-SQLMI.sql sample file, but we recommend that you build your own binary and sign it with your own certificate and replace the binary string with your version.

 

 

Testing the Assembly

 

At this point you should be able to test the functions. You can run the regex function using a table column as input or more simply just pass a static string and as initial test (included in TestAssembly.sql);

 

select dbo.RegEx_Match('green CAR red car blue car', '(\w+)\s+(car)')

Assuming the function works – you should see a result like;

 

MitchvanHuuksloot_2-1643754800956.png

 

Note that since by default the Match method is case sensitive, the “green CAR” string is not a match. As noted above, you can override this behavior using the optional RegexOptions parameter.

 

 

Download the sample code

 

The sample code can be found in the Microsoft Download Center.

 

 

Feedback and suggestions

 

If you have feedback or suggestions for improving this data migration asset, please contact the Data Platform Engineering Team. Thanks for your support!

 

Version history
Last update:
‎Feb 03 2022 10:56 AM
Updated by: