spark version: 2.0.0
version for the hive: 2.0.1
I find the rewrite instruction in excel running in spark-sql, or the spark shell spends much more time than in hive-client (I run it in apache-hive-2.0.1-bin / bin / hive), where the spark costs about ten minutes, but the hive client only costs less than 20 seconds.
These are the steps that I have taken.
Sql test:
INSERT overwrite TABLE login4game partition(pt='mix_en',dt='2016-10-21')
SELECT DISTINCT account_name,
role_id,
server,
'1476979200' AS recdate,
'mix' AS platform,
'mix' AS pid,
'mix' AS dev
FROM tbllog_login
WHERE pt='mix_en'
AND dt='2016-10-21' ;
there are 257128 data lines in tbllog_login with partition(pt='mix_en',dt='2016-10-21')
ps:
I am sure that this should be "paste rewrite", spending a lot of time on a spark, maybe when you rewrite, you need to spend a lot of time on io or something else.
I also compare the runtime between the rewrite operator of the insert and insert into the statement.
1. :
10
30
2. hive-client:
30
hive-client 20
,