Loading data into a hive table with multiple encodings

Question

Loading data into a hive table with multiple encodings

I encounter problems when I have several files with different encodings, for example, one file has Chinese encodings and others have French encodings, how can I load them in one hive table? I searched the Internet and found this: -

ALTER TABLE mytable SET SERDEPROPERTIES ('serialization.encoding' = 'SJIS');

With this, I can handle encodings for one of the files in either Chinese or French. Is there a way to process both encodings once?

[UPDATE]

Well, I use RegexSerde for a fixed-width file along with the encoding scheme used - ISO 8859-1. It seems that Regex Serde does not take this encoding scheme into account and separates characters, considering the default UTF-8 encoding scheme. Is there a way to take the coding scheme into account using Regex serde.

+4

character-encoding hive hdfs

Paritosh ahuja Jan 26 '17 at 14:47

source share

1 answer

hlagos · Answer 1 · 2017-01-26T15:03:59+0000

I'm not sure if this is possible (I think it is not based on https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe. java ). A workaround would be to create two tables with different settings and create a view on top of this.

Loading data into a hive table with multiple encodings

More articles: