Runtime Error: from org.apache.hadoop.hive.ql.exec.DDLTask

Question

Runtime Error: from org.apache.hadoop.hive.ql.exec.DDLTask

I use the following regular expression (tab separately) to parse the data (also tab-delimited) that is given.

The syntax for creating an insert table is:

create table akmlogreg(logdate string, time string, clientip string, method string, uri string, status string, bytes string, TimeTakenMS string, referer string, useragent string, cs_Cookie string) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" ="([0-9-]+) ([^\t]*) ([^\t]*) ([^\t]*) ([^\t]*) ([^\t]*) ([^\t]*) ([^\t]*) (\".*\"|[^ ]*) (\".*\"|[^ ]*) ([^\r\n]+)", "output.format.string"="%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s");

through this regex, I want any comments (lines starting with C #) to be removed and select only one line at a time. But this syntax gives an error when I try to create a table in a hive. My logic regarding regex shared languages is that my log data is also tabbed. Can someone give me a better suggestion or solution with which I can parse such data that is tabbed using a regular expression?

An exception:

 FAILED: Error in metadata: java.util.regex.PatternSyntaxException: Unmatched closing ')' near index 10 ([0-9-]+)]+) ^ FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

Data:

 #Version: 1.0 #Fields: date time cs-ip cs-method cs-uri sc-status sc-bytes time-taken cs(Referer) cs(User-Agent) cs(Cookie) 2013-07-02 00:00:00 242.242.242.242 GET /9699/14916.jpg 200 6783 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.23 Safari/534.10" "-" 2013-07-02 00:00:00 242.242.242.242 GET /169875/2006-2010-679336-640x428.JPG 200 78221 355 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36" "-" 2013-07-02 00:00:00 242.242.242.242 GET /169875/2006-2010-679339-640x428.JPG 200 86791 238 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36" "-"

+4

regex hive

Naresh Jul 08 '13 at 9:16

source share

1 answer

Stephan · Accepted Answer · 2013-07-08T09:21:07+0000

Try the following:

 ^([0-9-]+)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t([^\t]*)\t(\".*?\"|[^ ]*)\t(\".*?\"|[^ ]*)\t([^\r\n]+)$

Runtime Error: from org.apache.hadoop.hive.ql.exec.DDLTask

More articles: