I run several scripts and I keep getting the same error. All of them combine several tables with the same condition.
Data is stored as parquet.
Vertex version 1.2.1 / MR
SELECT count(*) FROM xxx.tmp_usr_1 m INNER JOIN xxx.tmp_usr n ON m.date_id = n.date_id AND m.end_user_id = n.end_user_id LEFT JOIN xxx.usr_2 p ON m.date_id = p.date_id AND m.end_user_id = p.end_user_id;
Here is the error message:
2017-01-22 16: 47: 55,208 Stage-1 card = 54%, decrease = 0%, Cumulative processor 560.81 from 2017-01-22 16: 47: 56,248 Stage-1 card = 58%, decrease = 0 %, Storage CPU 577.74 from 2017-01-22 16: 47: 57,290 Stage-1 card = 100%, decrease = 100%, Storage processor 446.32 sec MapReduce Total processor time: 7 minutes 26 seconds 320 ms Completed task = job_1484710871657_6350 with errors Error during operation, receiving debugging information ... Verification of task identifier: task_1484710871657_6350_m_000061 (or more) from job job_1484710871657_6350 Verification of task identifier: task_1484710871657_6350_m_000069 (or more) from task _6350 Considering the problem ID: task_1484710871657_6350_m_000053 (and more) to work job_1484710871657_6350 expertise tasks ID: task_1484710871657_6350_m_000011 (and more) from works job_1484710871657_6350 expertise tasks ID: task_1484710871657_6350_m_000063 (and more) to work job_1484710871657_6350 Research task ID: task_1484710871657_6350_m_000049 (and more) from works job_1484710871657_6350 examination Task ID: task_1484710871657_6350_m_000052 (and more) from work job_1484710871657_6350 Task with the most failures (4): ----- Task identifier: task_1484710871657_6350_m_000071 URL: http: //xxxxxxxxxx//taskdetails.jsp14_161_ job_ job_ workflow - Diagnostic messages for this task: Error: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException (HiveIOExceptionHandlerChain.java:97ceptionCherele.reivendrelehandlehreilhandlehreiohandlehandlehreiohandlehandlehreilhandlehreilhandlehreilhandlehreiohandlehreilhandlehreiohandlehandlerelehandlehreiohandlehreiohandlehandlehreilandirelehandlehreiohandlehreiland (HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure $ CombineFileRecordReader.initNextRecordReader (HadoopShimsSecure.java:266) at org.apache.hadoop.hive.shims.Hecopordader (HadoopShimsSecure.java:213) on org.apache.hadoop.hive.shims.HadoopShimsSecure $ CombineFileInputFormatShim.getRecordReader (HadoopShimsSecure.javahaps33) at org.apache.hadoop.hive.ql.io.CormineReputInget Combateget : 719) at org.apache.hadoop.mapred.MapTask $ TrackedRecordReader. (MapTask.java:169) at org.apache.hadoop.mapred.MapTask.runOldMapper (MapTask.java:432) at org.apache.hadoop.mapred.MapTask.run (MapTask.javahaps43) at org.apache. hadoop.mapred.YarnChild $ 2.run (YarnChild.java:163) in java.security.AccessController.doPrivileged (native method) in javax.security.auth.Subject.doAs (Subject.java:422) at org.apache.hadoop .security.UserGroupInformation.doAs (UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main (YarnChild.java:158) Called: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImplnn0n own method) at sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45) in java.lang.rr eflect.Constructor.newInstance (Constructor.java:422) at org.apache.hadoop.hive.shims.HadoopShimsSecure $ CombineFileRecordReader.initNextRecordReader (HadoopShimsSecure.java:252) ... 11 more Caused by: javaaleverExl.IllegIlI.Exception schema data, found: PRIMITIVE, expected: STRUCT at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedGroupFields (DataWritableReadSupport.java:118) at org.apache.hadoop.hive.ql.io. parquet.read.DataWritableReadSupport.getSchemaByName (DataWritableReadSupport.java:156) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init (DataWritableReadSupport.java:222) at org.aphave.hap.have ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit (ParquetRecordReaderWrapper.java:256) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper. (ParquetRecordReaderWrapper.java:99) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper. (ParquetRecordReaderWrapper.java:85) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader (MapredParquetInputFormat.java:72) at org.apache.hadoop.hive.ql.io.CordineRiveRiveRiveRive. (CombineHiveRecordReader.java:67) ... 16 more Container killed by ApplicationMaster. Container killed on request. The exit code is 143. The container exited with a non-zero exit code 143
My data consists of approximately 20M records. When I try to join tables with a single column (end_user_id), I get the same error.
Connection columns are the same data type. Compound B as a subquery, and then Compound C can solve this problem.
We have many SQL queries with multi-table join statements with the same condition, but only a few SQL scripts encounter these errors.
source share