Inserting a rewrite instruction is much slower in spark-sql than in hive-client

spark version: 2.0.0

version for the hive: 2.0.1

I find the rewrite instruction in excel running in spark-sql, or the spark shell spends much more time than in hive-client (I run it in apache-hive-2.0.1-bin / bin / hive), where the spark costs about ten minutes, but the hive client only costs less than 20 seconds.

These are the steps that I have taken.

Sql test:

INSERT overwrite TABLE login4game partition(pt='mix_en',dt='2016-10-21')
SELECT DISTINCT account_name,
                role_id,
                server,
                '1476979200' AS recdate,
                'mix' AS platform,
                'mix' AS pid,
                'mix' AS dev
FROM tbllog_login
WHERE pt='mix_en'
  AND dt='2016-10-21' ;

there are 257128 data lines in tbllog_login with partition(pt='mix_en',dt='2016-10-21')

ps:

I am sure that this should be "paste rewrite", spending a lot of time on a spark, maybe when you rewrite, you need to spend a lot of time on io or something else.

I also compare the runtime between the rewrite operator of the insert and insert into the statement.

1. :

10

30

2. hive-client:

30

hive-client 20

,

+4

Source: https://habr.com/ru/post/1658881/


All Articles