Hive solution to select / handle null string as NULL

I have an external Hive table with csv data. Some of the string fields are null. Now I want to select the data and paste it into another table in ORC format with a query like 'select * from first insert into second'. I want to replace the string "null" with the actual value NULL.

One solution could be to replace β€œnull” with empty and design my table to handle empty as null. That might work. But, if there are any empty values ​​in the data, they will also be treated as NULL.

It occurs to me that a table has a large number of columns with such rows. Therefore, if a solution requires you to select a column and perform some operation; I need to write a very long request. But if there is no other option, this can be done.

Please suggest a solution.

+4
source share
2 answers

Later versions of Hive support a standard feature NULLIF(). If you are using insertthen you need to specify the columns:

insert into second(col1, col2, col3, . . .)
    select col1, nullif(col2, 'null'), col3, . . .
    from first;
+2
source

All you have to do is modify the external table so that it treats the row nullas NULL

alter table my_external_table set tblproperties('serialization.null.format'='null');
+2
source

Source: https://habr.com/ru/post/1681412/


All Articles