Getting null values ​​in a hive Creating and loading a query using REGEX

I have a log file in which I need to store data with REGEX. I tried to execute the query below, but loaded all the NULL values. I checked REGEX from http://www.regexr.com/ and it works great for my data.

CREATE EXTERNAL TABLE IF NOT EXISTS avl(imei STRING,packet STRING)                        
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (                                             
"input.regex" = "(IMEI\\s\\d{15} (\\b(\\d{15})([A-Z0-9]+)) )",          
"output.format.string" = "%1$s %2$s"                              
)
STORED AS TEXTFILE;

LOAD DATA INPATH 'hdfs:/user/user1/data' OVERWRITE INTO TABLE avl;

Please correct me here.

Log Example:

[INFO_|01/31 07:19:29]  IMEI 356307043180842 
[INFO_|01/31 07:19:33]  PacketLength = 372
[INFO_|01/31 07:19:33]  Recv HEXString : 0000000000000168080700000143E5FC86B6002F20BC400C93C6F000FF000E0600280007020101F001040914B34238DD180028CD6B7801C7000000690000000143E5FC633E002F20B3000C93A3B00105000D06002C0007020101F001040915E64238E618002CCD6B7801C7000000640000000143E5FC43FE002F20AA800C9381700109000F06002D0007020101F001040915BF4238D318002DCD6B7801C70000006C0000000143E5FC20D6002F20A1400C935BF00111000D0600270007020101F001040916394238B6180027CD6B7801C70000006D0000000143E5FBF5DE002F2098400C9336500118000B0600260007020101F0010409174D42384D180026CD6B7801C70000006E0000000143E5FBD2B6002F208F400C931140011C000D06002B0007020101F001040915624238C018002BCD6B7801C70000006F0000000143E5FBAF8E002F2085800C92EB10011E000D06002B0007020101F0010409154C4238A318002BCD6B7801C700000067000700005873

Thank.

+4
source share
1 answer

, . , _ TEXTFILE, (\r, \n \r\n), - SerDe.

RegexSerDe, , NULL. STORED AS TEXTFILE. NULL: .

, , , .

Hive - _:

STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'

TextInputFormat textinputformat.record.delimiter. TextInputFormat, Hadoop Hive, , .

, - EOR , EOR, . , , RegexSerDe , .

, ( ) , :

SET textinputformat.record.delimiter=EOR;

CREATE EXTERNAL TABLE ...
...
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
   "input.regex" = ...
   "output.regex" = ...
)
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
          OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION ...;

, textinputformat.record.delimiter EOF, ​​ , , .

( s > 1 , ), . - , .

+1

Source: https://habr.com/ru/post/1540626/


All Articles