With the help of a hive or a beeline client, I have no problem fulfilling this statement:
hive -e "LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2"
Data from the file is successfully loaded into the hive.
However, when using pyhs2 from the same machine, the file was not found:
import pyhs2 conn_str = {'authMechanism':'NOSASL', 'host':'azus',} conn = pyhs2.connect(conn_str) with conn.cursor() as cur: cur.execute("LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2")
Throws an exception:
Traceback (most recent call last): File "data_access/hs2.py", line 38, in write cur.execute("LOAD DATA LOCAL INPATH '%s' INTO TABLE %s" % (csv_file.name, table_name)) File "/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.py", line 63, in execute raise Pyhs2Exception(res.status.errorCode, res.status.errorMessage) pyhs2.error.Pyhs2Exception: "Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/tmp/tmpBKe_Mc'': No files matching path file:/tmp/tmpBKe_Mc"
I saw similar questions posted about this problem, and the usual answer is that the request is being executed on another server that does not have the local file '/ tmp / tmpBKe_Mc' stored on it. However, if so, why run the command directly from the CLI, but using pyhs2 does not work?
(Secondary question: how can I show which server is trying to process the request? I tried cur.execute ("set"), which returns all the configuration parameters, but when grepping for "host" returned the parameters donβt seem to contain the real name host.)
Thanks!
source share