Recently we had an interesting customer question about a seemingly strange behavior (and perhaps not widely known) on implicit conversions to Unicode.
Imagine you declare a non-unicode string variable, and when concatenating strings that seem to fit the variable declaration, you get a result that is trimmed, although the sum of all string sizes is not going over the variable data type limit. Maybe it’s best to use an example with two strings, my ‘String’ and 4,000
So we have a limit of 8,000 characters in the variable, but after concatenating a 6 character non-unicode string with a 4,000 character unicode expression, the output was trimmed to 4,000 characters, and not the expected 4,006.
However, the observation is that instead of 4,000
’s, if we had more (still keeping the N prefix), then we get the expected number of characters in the concatenated string.
The reason behind this is behavior is that when prefixing a string constant with the letter N, the implicit conversion will
result in a unicode string if the constant to convert does not exceed the max length for unicode string data type
Otherwise, the implicit conversion will result in a unicode large-value
In other words, what happens in the first case is:
Right-hand side expression is implicitly converted to a unicode string NVARCHAR(4000).
Concat follows the
rules of precedence
, so the entire concatenation is bound by the unicode string data limit (therefore trimmed to 4,000 characters).
Expression is assigned and converted to the variable data type VARCHAR(8000).
But in the second example, when concatenating ‘String’ with a larger than 4,000 character unicode string, the implicit conversion for the
’s is an NVARCHAR(max), and so the concatenation includes all expected characters.