Extra characters arrive during volume insertion

I am trying to bulk insert the first row of a csv file into a single column table. But in the beginning I get extra characters ("n ++"):

n++First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column 

The contents of the CSV file are as follows:

 First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column 

You can find the test.csv file here

And this is the code that I use to get the data of the first row in the table

 declare @importSQL nvarchar(2000) declare @tempstr varchar(max) declare @path varchar(100) SET @path = 'D:\test.csv' CREATE TABLE #tbl (line VARCHAR(max)) SET @importSQL = 'BULK INSERT #tbl FROM ''' + @path + ''' WITH ( LASTROW = 1, FIELDTERMINATOR = ''\n'', ROWTERMINATOR = ''\n'' )' EXEC sp_executesql @ stmt=@importSQL SET @tempstr = (SELECT TOP 1 RTRIM(REPLACE(Line, CHAR(9), ';')) FROM #tbl) print @tempstr drop table #tbl 

Any idea where this extra "n ++" comes from?

+4
source share
5 answers

It seems that UTF-8 files are not supported by SQL Server 2005 and 2008, it will be available only in version 11!

https://connect.microsoft.com/SQLServer/feedback/details/370419/bulk-insert-and-bcp-does-not-recognize-codepage-65001

+4
source

Additional charectors are called encoding. You can use the notepad used to change the encoding format from UTF-8 to Unicode. This removed "n ++" in the first line.

+4
source

Unicode Byte Order Mark may be marked.

I suggest you try setting the DATAFILETYPE parameter as part of your statement. See the MSDN Documentation for more details: http://msdn.microsoft.com/en-us/library/aa173832%28SQL.80%29.aspx

+3
source

Unfortunately, older versions of SQL Server do not support utf-8. Add the codepage parameter to the bulk insert method. In your question, please modify your code as existing.

 SET @importSQL = 'BULK INSERT #tbl FROM ''' + @path + ''' WITH ( LASTROW = 1, FIELDTERMINATOR = ''\n'', ROWTERMINATOR = ''\n'' , CODEPAGE=''65001'')' 

Please note that your file must be in utf-8 format. But the problem is that if you upgrade your server from 2005 to 2008, the code page 65001 (utf-8) is not supported, and then you get the message "code page is not supported"

+1
source

In later versions of SQL Server, you can add "-C 65001" to the command to tell it to use utf-8 encoding. This will remove n ++ from the first line. This is the capital letter C. Of course, when you enter a command, do not include quotation marks.

0
source

Source: https://habr.com/ru/post/1332064/


All Articles