Pyhs2 / hive There are no files matching the file and the path file. Exist

With the help of a hive or a beeline client, I have no problem fulfilling this statement:

hive -e "LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2" 

Data from the file is successfully loaded into the hive.

However, when using pyhs2 from the same machine, the file was not found:

 import pyhs2 conn_str = {'authMechanism':'NOSASL', 'host':'azus',} conn = pyhs2.connect(conn_str) with conn.cursor() as cur: cur.execute("LOAD DATA LOCAL INPATH '/tmp/tmpBKe_Mc' INTO TABLE unit_test_hs2") 

Throws an exception:

 Traceback (most recent call last): File "data_access/hs2.py", line 38, in write cur.execute("LOAD DATA LOCAL INPATH '%s' INTO TABLE %s" % (csv_file.name, table_name)) File "/edge/1/anaconda/lib/python2.7/site-packages/pyhs2/cursor.py", line 63, in execute raise Pyhs2Exception(res.status.errorCode, res.status.errorMessage) pyhs2.error.Pyhs2Exception: "Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/tmp/tmpBKe_Mc'': No files matching path file:/tmp/tmpBKe_Mc" 

I saw similar questions posted about this problem, and the usual answer is that the request is being executed on another server that does not have the local file '/ tmp / tmpBKe_Mc' stored on it. However, if so, why run the command directly from the CLI, but using pyhs2 does not work?

(Secondary question: how can I show which server is trying to process the request? I tried cur.execute ("set"), which returns all the configuration parameters, but when grepping for "host" returned the parameters don’t seem to contain the real name host.)

Thanks!

+5
source share
1 answer

This is because pyhs2 is trying to find the file in the cluster

The solution is to have your source stored in the appropriate hdfs location instead of / tmp

+1
source

Source: https://habr.com/ru/post/1208599/


All Articles