Neo4j internally supports label storage - it is basically a search to quickly get all the nodes that carry a specific label A
When executing a query like
MATCH (n:A:B) return count(n)
labelcanstore is used to search for all nodes of A , and then they are filtered if these nodes carry label B If n(A) >> n(B) more efficient to do MATCH (n:B:A) since you are only looking at a few nodes of B and filtering them for A.
You can use PROFILE MATCH (n:A:B) return count(n) to view the query plan. For Neo4j <= 2.1.x, you will see a different query plan depending on the order of the specified shortcuts.
Starting with Neo4j 2.2 (the M03 milestone available at the time of writing), there is a cost-based Cypher optimizer. Cypher now knows the statistics of the node, and they are used to optimize the query.
As an example, I used the following instructions to create some test data:
create (:A:B); with 1 as a foreach (x in range(0,1000000) | create (:A)); with 1 as a foreach (x in range(0,100) | create (:B));
Now we have 100 nodes B, nodes 1M A and 1 AB node. In 2.2, both statements:
MATCH (n:B:A) return count(n) MATCH (n:A:B) return count(n)
leads to the exact same tariff plan (and therefore to the same execution speed):
+------------------+---------------+------+--------+-------------+---------------+ | Operator | EstimatedRows | Rows | DbHits | Identifiers | Other | +------------------+---------------+------+--------+-------------+---------------+ | EagerAggregation | 3 | 1 | 0 | count(n) | | | Filter | 12 | 1 | 12 | n | hasLabel(n:A) | | NodeByLabelScan | 12 | 12 | 13 | n | :B | +------------------+---------------+------+--------+-------------+---------------+
Since there are only a few nodes with nodes B , it is cheaper to scan B and filter for A Smart Cypher, right? -)