Spatial index on the hive

Question

Spatial index on the hive

I create hive tables for a spatial database, I know that there are special indexes for spatial data, which are an R-tree. It is impossible to create a spatial index in a hive (I think). I was thinking of trying to make an index of x, y (long, lat), continuous decimal variables. I do not think this is most effective.

We use esri libraries for spatial algebra, but in some cases query performance is very poor.

Esri gis tools for hadoop

I thought...

Is it better to create an index for these two variables or split the table into a variable of the type xi = int (x / 0.2) * 0.2, yi = int (y / 0.2) * 0.2?

I think the section of the table is more efficient, but the design is more complex queries, I also think that the section does not support numeric variables with decimal places

The most typical bounding box queries are a series of spatial data. Does anyone know an efficient way to structure data using a hive for such queries?

Is there a way to create and embed an R-tree in a bush? Can you split a table with continuous decimal values (I saw a lot of examples and it seems not)

+4

hadoop spatial-index hive gis spatial

jmbluengo Jun 17 '13 at 20:40

source share

2 answers

Ivan Klass · Answer 1 · 2013-06-18T10:49:43+0000

There is also a k-dimensional tree for spatial data, which is much easier for me to work with.

jmbluengo · Answer 2 · 2013-06-19T22:00:09+0000

I saw only examples of sections with dates. Tile 2012, 2013 .... etc. These are truly discrete values. I don’t know if you can define a section with ranges, for example: y in [40.1, 42.4) and x in [-4,0), another statement y in [42.4, 43) and x in [-4, 0). .. etc. Sections do not have static ranges, because there will be spatial areas that do not have much information. This is a way to make Quadtree http://en.wikipedia.org/wiki/Quadtree , but linked to sections without an index. I think this will work as a spatial index, will work in a hive, and maybe in an elegant way.

This is my idea. I hope someone finds a way to make partitions as well, and most importantly, x and y are the variables that define the partition to be checked, or an elegant alternative

This is an example of creating partitions.

CREATE TABLE sales (sales_order_id BIGINT, order_amount FLOAT, order_date STRING, due_date STRING, customer_id BIGINT) PARTITIONED BY (country STRING, year INT, month INT, day INT);

Spatial index on the hive

More articles: