S3 & # 8594; Redshift cannot handle UTF8

We have a file in S3, which is uploaded to Redshift using the command COPY. Import error because the value VARCHAR(20)contains Ä, which translates to ..during the copy command and is now too long for 20 characters.

I checked the correctness of the data in S3, but COPYdoes not understand the UTF-8 characters during import. Has anyone found a solution for this?

+9
source share
5 answers

TL; DR

The byte length for the column varcharmust be greater.

Detail

(UTF-8) varchar, , , , NOT.

AWS :

varchar UTF-8, .

, , Ä , 2 1 .

AWS VARCHAR CHARACTER VARYING :

... VARCHAR(120) 120 , 60 , 40 30 .

UTF-8 : UTF-8

"LATIN CAPITAL LETTER A WITH DIAERESIS" (U + 00C4) .

+15

"ACCEPTINVCHARS ESCAPE" .

+1

, , Ä, mysqldump Redshift. , mysqldump latin1, mysql . COPY. UTF-8, .

0

You need to increase the size of the varchar column. Check the stl_load_errors table, see what is the actual length of the field value for erroneous rows, and increase the size accordingly. UPDATE: I just realized that this is a very old post, anyway, if someone needs it ..

0
source

Source: https://habr.com/ru/post/1568684/


All Articles