Where does the hive keep its table?

I am new to Hadoop, and I just started working on Hive, I understand that it provides a query language for processing data in HDFS. With HiveQl we can create tables and load data from HDFS into it.

So my question is: where are these tables stored? In particular, if we have a 100 GB file in our HDFS, and we want to display a beehive table from this data, what is the size of this table and where is it stored?

If my understanding of this concept is wrong, please correct me.

+6
source share
2 answers

If the table is 100 GB, you should consider the external hive table (unlike the β€œmanaged table,” see this for a difference).

With an external table, the data itself will be saved on HDFS in the file path you specify (note that you can specify the file directory if they all have the same structure), but Hive will create its map in the meta-storage, whereas managed the table will store data "in the hive."

When you delete a managed table, it discards the underlying data, and does not delete the external hive table, which only throws metadata from the meta store, referencing this data.

In any case, you use only 100 GB when viewed by the user and take advantage of HDFS, although data duplication.

+4
source

hive will create a directory on hdfs.if, you did not specify any location, it will create a directory in the folder / user / hive / storage on hdfs.after upload the command files moved to the folder / store / tablename. You can also point to the hdfs folder if it contains partitions (if the files are divided) .or use the external concept of the table.

+1
source

Source: https://habr.com/ru/post/984266/


All Articles