Can Hive recursively descend to subdirectories without sections or edit hive-site.xml?

I have some web server logs that I would like to request using Hive. The directory structure in HDFS is as follows:

/data/access/web1/2014/09 /data/access/web1/2014/09/access-20140901.log [... etc ...] /data/access/web1/2014/10 /data/access/web1/2014/10/access-20141001.log [... etc ...] /data/access/web2/2014/09 /data/access/web2/2014/09/access-20140901.log [... etc ...] /data/access/web2/2014/10 /data/access/web2/2014/10/access-20141001.log [... etc ...] 

I can create an external table:

 CREATE EXTERNAL TABLE access( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s") LOCATION '/data/access/' 

... although Hive doesn't go down to subfolders unless I run the following commands before running the Hive request:

 set hive.input.dir.recursive=true; set hive.mapred.supports.subdirectories=true; set hive.supports.subdirectories=true; set mapred.input.dir.recursive=true; 

I saw other records setting these properties at the table level (for example, Problem creating an external Hive table using tblproperties ):

 TBLPROPERTIES ("hive.input.dir.recursive" = "TRUE", "hive.mapred.supports.subdirectories" = "TRUE", "hive.supports.subdirectories" = "TRUE", "mapred.input.dir.recursive" = "TRUE"); 

Unfortunately, this did not work for me: the table does not return any records when requested. I understand that you can set these properties in hive-site.xml, but I would prefer not to make any changes that could affect other users if I do not need it.

Q) Is there a way to create a table that descends into subdirectories without using partitions, making changes at the level of the entire site or by executing these 4 commands each time?

+6
source share
4 answers

Using Hive in HDInsight, I set the following properties before creating my external table in a Hive query, and this works for me.

 SET hive.mapred.supports.subdirectories=TRUE; SET mapred.input.dir.recursive=TRUE; 
+12
source

These are not table properties.

 TBLPROPERTIES ("hive.input.dir.recursive" = "TRUE", "hive.mapred.supports.subdirectories" = "TRUE", "hive.supports.subdirectories" = "TRUE", "mapred.input.dir.recursive" = "TRUE"); 

A) add

  <property> <name>mapred.input.dir.recursive</name> <value>true</value> </property> <property> <name>hive.mapred.supports.subdirectories</name> <value>true</value> </property> 

in hive-site.xml

+2
source

If you are using ambari, set the following properties to reinforce the advanced configuration inside custom hive-site.xml.

** - SET hive.input.dir.recursive = TRUE

SET hive.mapred.supports.subdirectories = TRUE

SET hive.supports.subdirectories = TRUE

SET mapred.input.dir.recursive = TRUE **

And then restart the related services. This will recursively read all the data.

0
source

settings from ozw1z5rd post worked on Hortonworks

 alter table .... set blproperties ( "hive.input.dir.recursive" = "TRUE", "hive.mapred.supports.subdirectories" = "TRUE", "hive.supports.subdirectories" = "TRUE", "mapred.input.dir.recursive" = "TRUE"); 
0
source

Source: https://habr.com/ru/post/1206292/


All Articles