I encounter problems when I have several files with different encodings, for example, one file has Chinese encodings and others have French encodings, how can I load them in one hive table? I searched the Internet and found this: -
ALTER TABLE mytable SET SERDEPROPERTIES ('serialization.encoding' = 'SJIS');
With this, I can handle encodings for one of the files in either Chinese or French. Is there a way to process both encodings once?
[UPDATE]
Well, I use RegexSerde for a fixed-width file along with the encoding scheme used - ISO 8859-1. It seems that Regex Serde does not take this encoding scheme into account and separates characters, considering the default UTF-8 encoding scheme. Is there a way to take the coding scheme into account using Regex serde.
source
share