How can I split a table using HIVE?

I have been playing with Hive for a few days, but it’s still hard for me with the section.

I have been writing Apache logs (Combine format) in Hadoop for several months. They are saved in the text format of the string, separated by date (via the tray): / Logs / yyyy / mm / dd / hh / *

Example:

/logs/2012/02/10/00/Part01xx (02/10/2012 12:00 am) /logs/2012/02/10/00/Part02xx /logs/2012/02/10/13/Part0xxx (02/10/2012 01:00 pm) 

The date in the file in the combined journal corresponds to this format [10 / Feb / 2012: 00: 00: 00 -0800]

How to create an external table with a partition in Hive that uses my physical partition. I can not find good documentation on the Hive section. I found a related question, for example:

If I upload my logs to an external table using Hive, I cannot split this time, as this is not a good format (Feb <=> 02). Even if it were in a good format, how to convert the string "10/02/2012: 00: 00: 00 -0800" into several directories "/ 2012/02/10/00"?

Ultimately, I could use a pig script to convert my raw logs to beehive tables, but at this point I should just use pigs instead of the beehive to make my messages.

+6
source share
1 answer

If I understand correctly, you have files in level 4 folders deep from the directory logs. In this case, you define your table as external using "logs" and divide it into 4 virtual fields: year, month, day_of_mech, hour_of_day.

Separation is essentially done for you with Flume.

EDIT 3/9: A lot of the details depend on how Flume writes the files. But in general terms, your DDL should look something like this:

 CREATE TABLE table_name(fields...) PARTITIONED BY(log_year STRING, log_month STRING, log_day_of_month STRING, log_hour_of_day STRING) format description STORED AS TEXTFILE LOCATION '/your user path/logs'; 

EDIT 3/15:. In the zzarbi query, I add a note that after creating the table, Hive should be informed about the created partitions. This needs to be done several times while Flume or another process creates new partitions. See My answer to the question Create external with section .

+5
source

Source: https://habr.com/ru/post/910293/


All Articles