I am new to Hadoop and python and run into some problems. Appreciate your help ...
I have a file of 150 records (just a sample) with 10 columns, each of which was loaded into the Hive table (table1). 10 (let col10 call it) is encoded in utf-8, so to decode it, I wrote a small Python function (called pyfile.py) that looks like this:
Python function:
import sys
import urllib
for line in sys.stdin:
line = line.strip()
col10 = urllib.unquote(line).decode('utf8')
print ''.join(col10.replace("+",' '))
I added the file to the distributed cache using the following command:
add FILE folder1/pyfile.py;
Now I call this Python function in col10 of my hive table using Transform as follows:
Select Transform(col10)
USING 'python pyfile.py'
AS (col10)
From table1;
Problem:
, 100 , , 101-150 :
2015-10-30 00:58:20,320 INFO [IPC Server handler 0 on 33716] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1445826741287_0032_m_000000_0: Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:217)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:557)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
... 8 more
101-150 , python script
, .
, , .