I am trying to run a very simple python script through hive and hadoop.
This is my script:
#!/usr/bin/env python import sys for line in sys.stdin: line = line.strip() nums = line.split() i = nums[0] print i
And I want to run it in the following table:
hive> select * from test; OK 1 3 2 2 3 1 Time taken: 0.071 seconds hive> desc test; OK col1 int col2 string Time taken: 0.215 seconds
I run:
hive> select transform (col1, col2) using './proba.py' from test;
But always get something like:
... 2011-11-18 12:23:32,646 Stage-1 map = 0%, reduce = 0% 2011-11-18 12:23:58,792 Stage-1 map = 100%, reduce = 100% Ended Job = job_201110270917_20215 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
I have tried many different modifications to this procedure, but I constantly fail. :(
Am I doing something wrong or is there a problem with my hive / hadoop installation?
twowo source share