Cypher - query optimization

Question

Cypher - query optimization

My question is: why is the WHERE statement not working as fast as expected? I have 7 nodes labeled Consumer . Here are some sample data ...

 MERGE (c:Consumer {mobileNumber: "000000000000"}) MERGE (:Consumer {mobileNumber: "111111111111"}) MERGE (:Consumer {mobileNumber: "222222222222"}) MERGE (:Consumer {mobileNumber: "333333333333"}) MERGE (:Consumer {mobileNumber: "444444444444"}) MERGE (:Consumer {mobileNumber: "555555555555"}) MERGE (:Consumer {mobileNumber: "666666666666"}) WITH c MATCH (c1:Consumer) WHERE c1.mobileNumber <> "000000000000" MERGE (c)-[:HAS_CONTACT]->(c1)

And between :Consumer(mobileNumber:{"000000000000"}) and all the other 6 nodes there is a HAS_CONTACT connection. There is also a unique index constraint on the mobileNumber field. Now when I try to execute the request below:

  PROFILE MATCH (n:Consumer{mobileNumber : "000000000000"}), (m:Consumer{mobileNumber : "111111111111"}) WITH n,m MATCH path = SHORTESTPATH((n)-[contacts:HAS_CONTACT]-(m)) RETURN contacts;

Thus, it works fine, as expected (search nodes based on a unique index). The following is its result:

Now modify the query above using the WHERE :

 PROFILE MATCH (n:Consumer{mobileNumber : "000000000000"}), (m:Consumer) WHERE m.mobileNumber IN (["111111111111"]) WITH n,m MATCH path = SHORTESTPATH((n)-[contacts:HAS_CONTACT]-(m)) RETURN contacts;

Request Result:

Now, although the above query works fine and gives the same result as the old one. But for endNode, where I used the WHERE , it does not use any indexes. First, it searches for all existing nodes, and then filters the result using the WHERE , which can be too expensive if there are hundreds of thousands of nodes with the same label.

So my questions are:

Why doesn't it use indexes when I use the WHERE ?
What is the best way to reference multiple nodes with fewer db deletes?
Can I use the IN operator while waiting for an index search?

+5

spring-data-neo4j neo4j cypher

Afridi Jul 24 '17 at 7:12

source share

3 answers

cybersam · Answer 1 · 2017-07-24T19:46:21+0000

As @DaveBennett said, this problem does not seem to exist in version 3.2.2.

If you are using a previous version, try providing hints to the scheduler to use indexing:

 PROFILE MATCH (n:Consumer{mobileNumber : "000000000000"}), (m:Consumer) USING INDEX n:Consumer(mobileNumber) USING INDEX m:Consumer(mobileNumber) WHERE m.mobileNumber IN (["111111111111"]) MATCH path = SHORTESTPATH((n)-[contacts:HAS_CONTACT]-(m)) RETURN contacts;

This may also work, as some schedulers seem to automatically try to use only the (first) MATCH indexing:

 PROFILE MATCH (n:Consumer{mobileNumber : "000000000000"}), (m:Consumer) USING INDEX m:Consumer(mobileNumber) WHERE m.mobileNumber IN (["111111111111"]) MATCH path = SHORTESTPATH((n)-[contacts:HAS_CONTACT]-(m)) RETURN contacts;

Dave bennett · Answer 2 · 2017-07-24T13:37:05+0000

What version are you working in? I am using community version 3.2.2, and the second query generated the result you were looking for in my local instance, with a small set of test data.

However, the query planner will change its approach with something similar in your case?

 PROFILE MATCH (n:Consumer {mobileNumber : "000000000000"}) WITH n,(["111111111111", "222222222222", "333333333333", "444444444444", "555555555555", "666666666666"]) as number_list UNWIND number_list as number MATCH (m:Consumer {mobileNumber : number}) MATCH path = SHORTESTPATH((n)-[contacts:HAS_CONTACT]-(m)) RETURN contacts;

Fabio lamana · Answer 3 · 2017-07-24T08:03:39+0000

In this example, indexes are actually used with the WHERE :

 PROFILE MATCH (n:Consumer{mobileNumber : "000000000000"}), (m:Consumer) WHERE m.mobileNumber = "111111111111" WITH n,m MATCH path = shortestPath((n)-[contacts:HAS_CONTACT]-(m)) RETURN contacts

which uses indexes as your first query. If you want to access multiple nodes, you can use logical predicates, for example:

 PROFILE MATCH (n:Consumer{mobileNumber : "000000000000"}), (m:Consumer) WHERE m.mobileNumber = "111111111111" OR m.mobileNumber = "222222222222" WITH n,m MATCH path = shortestPath((n)-[contacts:HAS_CONTACT]-(m)) RETURN contacts

or AND instead of OR .

I think the current version of Neo4j does not support the use of indexes when viewing an array with an IN clause.

Cypher - query optimization

More articles: