First published on MSDN on Oct 19, 2016
Authored by Sanjay Mishra
Sanjay Mishra
Reviewed By: Dimitri Furman, Murshed Zaman, Kun Cheng
If you have tried to use BULK INSERT or bcp utilities to load UTF-8 data into a table in SQL Server 2014 or in an earlier release (SQL Server 2008 or later), you have likely received the following error message:
Msg 2775, Level 16, State 13, Line 14
The code page 65001 is not supported by the server.
The requirement to support UTF-8 data for these utilities has been extensively discussed on various forums, most notably on
Connect
.
This requirement has been addressed in SQL Server 2016 (and
backported
to SQL Server 2014 SP2). To test this, I obtained a UTF-8 dataset from
http://www.columbia.edu/~fdc/utf8/
. The dataset is translation of the sentence “
I can eat glass and it doesn’t hurt me
” in several languages. A few lines of sample data are shown here:
(As an aside, it is entirely possible to load Unicode text such as above into SQL Server even without this improvement, as long as the source text file uses a Unicode encoding other than UTF-8.)
-- SQL Server 2014 SP1 or earlier
CREATE DATABASE DemoUTF8_2014
GO
USE DemoUTF8_2014
GO
CREATE TABLE Newdata
(
lang VARCHAR(200),
txt NVARCHAR(1000)
)
GO
BULK INSERT Newdata
FROM 'C:\UTF8_Test\i_can_eat_glass.txt'
WITH (DATAFILETYPE = 'char', FIELDTERMINATOR='\t', CODEPAGE='65001')
GO
Msg 2775, Level 16, State 13, Line 14
The code page 65001 is not supported by the server.
-- SQL Server 2016 RTM or SQL Server 2014 SP2 or later
CREATE DATABASE DemoUTF8_2016
GO
USE DemoUTF8_2016
GO
CREATE TABLE Newdata
(
lang VARCHAR(200),
txt NVARCHAR(1000)
)
GO
BULK INSERT Newdata
FROM 'C:\UTF8_Test\i_can_eat_glass.txt'
WITH (DATAFILETYPE = 'char', FIELDTERMINATOR='\t', CODEPAGE='65001')
GO
(150 row(s) affected)
SELECT * FROM Newdata
GO
You can
now use
CODEPAGE='65001' with BULK INSERT, bcp and OPENROWSET utilities.
Note that this improvement is only scoped to input processing by bulk load utilities. Internally, SQL Server still uses the UCS-2 encoding when storing Unicode strings.