Creating an Impala External Table from a Partitioned File Structure

Provided that the partitioned structure fs is as follows:

logs └── log_type └── 2013 ├── 07 │  ├── 28 │  │  ├── host1 │  │  │  └── log_file_1.csv │  │  └── host2 │  │  ├── log_file_1.csv │  │  └── log_file_2.csv │  └── 29 │  ├── host1 │  │  └── log_file_1.csv │  └── host2 │  └── log_file_1.csv └── 08 

I am trying to create an external table in Impala:

 create external table log_type ( field1 string, field2 string, ... ) row format delimited fields terminated by '|' location '/logs/log_type/2013/08'; 

I want Impala to correspond in a subdirectory and download all csv files; but there is no cigar. No errors occur, but the data is not loaded into the table.

Various globes, such as /logs/log_type/2013/08/*/* or /logs/log_type/2013/08/*/*/* , did not work either.

Is there any way to do this? Or do I need to restructure fs - any advice on this?

+4
source share
2 answers

if you are still looking for an answer. You need to register each individual section manually.

For more information, see Register an external table.

Your table layout should be adjusted.

 create external table log_type ( field1 string, field2 string, ...) partitioned by (year int, month int, day int, host string) row format delimited fields terminated by '|'; 

After you change the scheme to include the year, month, day, and host, you must add each to the table recursively.

Something like that

 ALTER TABLE log_type ADD PARTITION (year=2013, month=07, day=28, host="host1") LOCATION '/logs/log_type/2013/07/28/host1'; 

Then you need to update the table in impala.

 invalidate log_type; refresh log_type; 
+9
source

Another way to do this could be to use the LOAD DATA function in Impala. If your data is in SequenceFile format or in a format other than Impala (Impala file formats ), you can create your own external table as Joey does, but instead of ALTER TABLE you can do something like

 LOAD DATA INPATH '/logs/log_type/2013/07/28/host1/log_file_1.csv' INTO TABLE log_type PARTITION (year=2013, month=07, day=28, host=host1); 
0
source

Source: https://habr.com/ru/post/1495784/


All Articles