I upload a log file to S3 in Hive running on EMR, but I return all NULL when viewing the data ...
I created the table as follows:
create external table coglogs (
HostID string,
ProcessID string,
Time string,
TimeZoneOffset string,
SessionID string,
RequestID string,
SubRequestID string,
StepID string,
Thread string,
Component string,
BuildNumber string,
Level string,
Logger string,
Operation string,
ObjectType string,
ObjectPath string,
Status string,
Message string,
LogData string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z_\S]*)\t([0-9]*)\t([0-9]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s %12$s %13$s %14$s %15$s %16$s %17$s %18$s %19$s"
)
location 's3n://infinilog/cogserver_hive.log';
and then upload the data:
load data inpath 's3n://infinilog/cogserver.log' into table coglogs;
I checked regex rubular.com and it seems correct, but why am I not getting any data in the hive table?
Here is an example line in a log file:
101.196.242.160:9300 11329 2016-01-27 18:35:14.132 -5 http-9300-297 caf 2047 2 Audit.dispatcher.caf Request Warning secure error - found userCapabilities
101.196.242.160:9300 11329 2016-01-27 18:35:14.195 -5 F3820773ADD59754DEAFAFDFA0D2C3F2CDDF715483446C99B16BFA8040D0DDA4 9ss8Csyswh28w8sGC4hhvvMy2hw9d48Cd42y92y8 http-9300-299 CM 6102 3 Audit.Other.cms.CM QUERY REPORT /Public Folders/1A Reporting/PT/Reports/Summary Dashboard Success
EDIT:
so there is data in the table (I see a lot of rows with all the column values ββas βNoβ), but when I run something like this:
select count(*)
from coglogs
where status = 'Failure'
or any request, I will return this:
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:42)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:153)
... 22 more
Caused by: java.lang.NoSuchFieldError: stringTypeInfo
at org.apache.hadoop.hive.contrib.serde2.RegexSerDe.initialize(RegexSerDe.java:115)
at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:138)
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297)
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:333)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:122)
... 22 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec