Get the number of rows from all tables in the hive

How can I get the number of rows from all tables using a hive. I'm interested in the database name, table name and row count

+6
source share
8 answers

You will need to do

select count(*) from table

for all tables.

To automate this, you can make small bash script commands and some bash. First start

$hive -e 'show tables' | tee tables.txt

All tables in the database are stored here in the text file tables.txt

Create a bash file (count_tables.sh) with the following contents.

while read line
do
 echo "$line "
 eval "hive -e 'select count(*) from $line'"
done

Now run the following commands.

$chmod +x count_tables.sh
$./count_tables.sh < tables.txt > counts.txt

This creates a text file (counts.txt) with a count of all the tables in the database

+22
source

- . , :

TableScan [TS_0] (rows=224910 width=78)

, .

+3

select count(*) from table

, .

+1

;.

hive -e 'use myDatabase;show tables'
0

, , automate-- , bash filename.sh

-e ' ( ) 1, extracttimestamp & lt;' 2018-04-26 ''> sample.out

-e ' ( ) 2, = '26' '> sample.out

LC = cat sample.out | uniq | wc -l [$ lc -e q 1]; ""   "FAIL"

0

, :

while read line
do
 echo "$line "
 eval "hive -e 'select count(*) from $line'"
done
0

Hive ANALAYZE. Hive .

Hive:

hive> ANALYZE TABLE stud COMPUTE STATISTICS;
 Query ID = impadmin_20171115185549_a73662c3-5332-42c9-bb42-d8ccf21b7221
 Total jobs = 1
 Launching Job 1 out of 1
 
 Table training_db.stud stats: [numFiles=5, numRows=5, totalSize=50, rawDataSize=45]
 OK
 Time taken: 8.202 seconds

:http://dwgeek.com/apache-hive-explain-command-example.html/

0

, , Python:

import os
dictTabCnt={}

print("=====Finding Tables=====")
tableList = os.popen("hive --outputformat=dsv --showHeader=false -e \"use [YOUR DB HERE]; show tables;\"").read().split('\n')

print("=====Finding Table Counts=====")
for i in tableList:
    if i <> '':
        strTemp = os.popen("hive --outputformat=dsv --showHeader=false -e \"use [YOUR DB HERE]; SELECT COUNT(*) FROM {}\"".format(i)).read()
        dictTabCnt[i] = strTemp

print("=====Table Counts=====")
for table,cnt in dictTabCnt.items():
    print("{}: {}".format(table,cnt))
0
source

Source: https://habr.com/ru/post/1527836/


All Articles