Hive jdbc performance improvement

Does aynyone know how to improve performance for JDBC connections for HIVE.

Detailed problem:

When I request a hive from the Hive CLI, I get a response within 7 seconds, but from a JDBC connection from HIVE I get a response after 14 seconds. I was wondering if there are any changes (configuration changes) with which I can improve the performance for a query through a JDBC connection.

Thanks in advance.

+4
source share
4 answers

JVBC . , , , , .

, , - , , .

+1

.

  • , hive.auto.convert.join true.
  • Java Heap Size Garbage Collection

  • Tez, set hive.execution.engine = tez , hive.execution.engine.

Hive

, .

0

jdbc jdbc - , ( jdbc 3.0). hive cli .

-- enable cost based optimizer
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;

--collects statistics
analyze table <TABLENAME> compute statistics for columns;

--enable vectorization of queries.
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;

,

0

If your database is Oracle, you can try Oracle Table Access for Hadoop and Spark (OTA4H) , which can also be used from Hive QL, OTA4H optimizes JDBC queries to retrieve data from Oracle using splitters to get maximum performance. You can join Hive tables with external tables inside Oracle directly in your hive requests.

0
source

Source: https://habr.com/ru/post/1679629/


All Articles