We are excited to announce that the UNISTR intrinsic function and ANSI SQL concatenation operator (||) are now available in public preview in Azure SQL Database. The UNISTR function allows you to escape Unicode characters, making it easier to work with international text. The ANSI SQL concatenation operator (||) provides a simple and intuitive way to combine characters or binary strings. These new features will enhance your ability to manipulate and work with text data.
What is UNISTR function?
The UNISTR function takes a text literal or an expression of characters and Unicode values, that resolves to character data and returns it as a UTF-8 or UTF-16 encoded string. This function allows you to use Unicode codepoint escape sequences with other characters in the string. The escape sequence for a Unicode character can be specified in the form of \xxxx or \+xxxxxx, where xxxx is a valid UTF-16 codepoint value, and xxxxxx is a valid Unicode codepoint value. This is especially useful for inserting data into NCHAR columns.
The syntax of the UNISTR function is as follows:
UNISTR ( 'character_expression' [ , 'unicode_escape_character' ] )
- The data type of character_expression could be char, nchar, varchar, or nvarchar. For char and varchar data types, the collation should be a valid UTF-8 collation only.
- A single character representing a user-defined Unicode escape sequence. If not supplied, the default value is \.
Examples
Example #1:
For example, the following query returns the Unicode character for the specified value:
-- All the queries below will produce the same output.
SELECT UNISTR(N'Hello \D83D\DE00');
SELECT UNISTR(N'\0048ello \+01F603');
SELECT UNISTR(N'\0048\0065\006C\006C\006F \+01F603');
Results:
——————————-
Hello
Example #2:
In this example, the UNISTR function is used with a user-defined escape character ($) and a VARCHAR data type with UTF-8 collation.
SELECT UNISTR ('I $2665 Azure SQL.' COLLATE Latin1_General_100_CI_AS_KS_SC_UTF8, '$');
Results:
——————————-
I ♥ Azure SQL.
The legacy collations with code page can be identified using the query below:
SELECT DISTINCT p.language, p.codepage
FROM sys.fn_helpcollations() AS c CROSS APPLY (VALUES(LEFT(c.name, CHARINDEX('_', c.name)-1), COLLATIONPROPERTY(c.name, 'codepage'))) AS p(language, codepage)
WHERE p.codepage NOT IN (0 /* Unicode Only collation */, 65001 /* UTF-8 code page */);
What is ANSI SQL concatenation operator (||)?
The ANSI SQL concatenation operator (||) concatenates two or more characters or binary strings, columns, or a combination of strings and column names into one expression. The || operator does not honor the SET CONCAT_NULL_YIELDS_NULL option and always behaves as if the ANSI SQL behavior is enabled. This operator will work with character strings or binary data of any supported SQL Server collation. The || operator supports compound assignment ||= similar to +=. If the operands are of incompatible collation, then an error will be thrown. The collation behavior is identical to the CONCAT function of character string data.
The syntax of the string concatenation operator is as follows:
expression || expression
- The expression is a character or binary expression. Both expressions must be of the same data type, or one expression must be able to be implicitly converted to the data type of the other expression. If one operand is of binary type, then an unsupported operand type error will be thrown.
Examples
Example #1:
For example, the following query concatenates two strings and returns the result:
SELECT 'Hello ' || ' World!';
Results:
——————————-
Hello World!
Example #2:
In this example, multiple character strings are concatenated. If at least one input is a character string, non-character strings will be implicitly converted to character strings.
SELECT 'Josè' || ' Doe' AS full_name,
'Order-' || CAST(1001 AS VARCHAR) || '~TS~' || current_timestamp || '~' || NEWID() AS order_details,
'Item-' || NULL AS item_desc;
Results:
——————————-
full_name order_details item_desc
Josè Doe Order-1001~TS~Jun 1 2024 6:25AM~442A4706-0002-48EC-84FC-8AF27XXXX NULL
Example #3:
In the example below, concatenating two or more binary strings and also compounding with T-SQL assignment operator.
DECLARE @v1 VARBINARY(10) = 0x1a;
SET @v1 ||= 0x2b;
SELECT @v1 as V1, 0x|| 0x4E as B1, CAST(NEWID() AS VARBINARY) || 0xa5 as B2
Results:
——————————-
V1 B1 B2
0x1A2B 0x4E 0xAE8C602E951AC245ADE767A23C834704A5
Example #4:
As shown in the example below, using the || operator with only non-character types or combining binary data with other types is not supported.
SELECT 1|| 2;
SELECT 1|| 'a' || 0x4e;
Above queries will fail with error messages as below –
The data types int and int are incompatible in the concat operator.
The data types varchar and varbinary are incompatible in the concat operator.
Conclusion
In this blog post, we have introduced the UNISTR function and ANSI SQL concatenation operator (||) in Azure SQL Database. The UNISTR function allows you to escape Unicode characters, making it easier to work with international text. ANSI SQL concatenation operator (||) provides a simple and intuitive way to combine characters or binary data. These new features will enhance your ability to manipulate and work with text data efficiently.
We hope you will explore these enhancements, apply them in your projects, and share your feedback with us to help us continue improving. Thank you!