Index does not work when using order (). By () in Titan

Titan 's documentation states that:

Mixed indexes support ordering initially and efficiently. However, the property key used in the order () method. By () must be pre-added to mixed indexed to support native result support. This is important in cases where the key is order (). By () is different from request keys. If the property key is not part of the index, sorting requires loading all the results into memory.

So, I made a mixed index in the prop1 property. The mixed index on prop1 works well when the value is specified.

 gremlin> gV().has('prop1', gt(1)) /* this gremlin uses the mixed index */ ==>v[6017120] ==>v[4907104] ==>v[8667232] ==>v[3854400] ... 

But, when I use order().by() on prop1 , I cannot use the mixed index.

 gremlin> gV().order().by('prop1', incr) /* doesn't use the mixed index */ 17:46:00 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [()]. For better performance, use indexes Could not execute query since pre-sorting requires fetching more than 1000000 elements. Consider rewriting the query to exploit sort orders 

Also count() takes so long.

 gremlin> gV().has('prop1').count() 17:44:47 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [()]. For better performance, use indexes 

I would be glad if I knew what was wrong with me. Here is my Titan info:

  • Titanium Version: 1.0.0-hadoop1
  • Storage: Cassandra 2.1.1
  • Index Backend: ElasticSearch 1.7

Thanks.

+1
source share
1 answer

You must specify a value to filter the indexes to be used. Here:

 gV().order().by('prop1', incr) 

you do not provide any filter, so Titan should repeat all V() and then apply sorting.

Here:

 gV().has('prop1').count() 

you supply an indexed key, but do not specify a value for filtering, so that it continues to repeat all V() . You can do:

 gV().has("prop1", textRegex(".*")).count() 

In this case, you tampered with Titan a bit, but the query can still be slow if that query returns a lot of results to be repeated.

+3
source

Source: https://habr.com/ru/post/1237158/


All Articles