Neo4j: MERGE creates duplicate nodes

My database model has users and MAC addresses. A user can have multiple MAC addresses, but a MAC can belong to only one user. If a user establishes his MAC and that the MAC is already connected to another user, the existing relationship is deleted and a new relationship is created between the new owner and this MAC. In other words, the MAC moves between users.

This is a specific instance of a Cypher request that I use to assign MAC addresses:

MATCH (new:User { Id: 2 }) MERGE (mac:MacAddress { Value: "D857EFEF1CF6" }) WITH new, mac OPTIONAL MATCH ()-[oldr:MAC_ADDRESS]->(mac) DELETE oldr MERGE (new)-[:MAC_ADDRESS]->(mac) 

The request is executed perfectly in my tests, but in production, for some strange reason, it sometimes creates duplicate MacAddress nodes (and a new relationship between the user and each of these nodes). That is, a specific user can have multiple MacAddress nodes with the same Value .

I can say that they are different nodes, because they have different node identifiers. I am also sure that Value exactly the same, because I can do collect(distinct mac.Value) on them, and the result is a collection with one element. The above request is the only one in the code that creates MacAddress nodes.

I am using Neo4j 2.1.2. What's going on here?

Thanks, Jan

+5
source share
2 answers

This is the answer I received from Neo4j support (my attention):

I have already received some feedback from our team, and it is currently known that this can happen in the absence of restrictions . MERGE is effectively MATCH or CREATE - and these two steps are performed independently in the transaction. Given the parallel execution and the read level of isolation, a race condition exists between them.

The team held some discussion on how to provide a higher guarantee in the face of concurrency, and whether it was really marked as a function request for consideration.

Meanwhile, they assured me that using the restriction would provide the uniqueness you are looking for.

+4
source

Are you sure these are all the queries you use? MERGE has this very common place where it combines everything that you give it. So what people expect:

 neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" }); +-------------------+ | No data returned. | +-------------------+ Nodes created: 1 Properties set: 1 Labels added: 1 1650 ms neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" }); +--------------------------------------------+ | No data returned, and nothing was changed. | +--------------------------------------------+ 17 ms neo4j-sh (?)$ match (mac:MacAddress { Value: "D857EFEF1CF6" }) return count(mac); +------------+ | count(mac) | +------------+ | 1 | +------------+ 1 row 200 ms 

So far so good. This is what we expect. Now see this:

 neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" })-[r:foo]->(b:SomeNode {label: "Foo!"}); +-------------------+ | No data returned. | +-------------------+ Nodes created: 2 Relationships created: 1 Properties set: 2 Labels added: 2 178 ms neo4j-sh (?)$ match (mac:MacAddress { Value: "D857EFEF1CF6" }) return count(mac); +------------+ | count(mac) | +------------+ | 2 | +------------+ 1 row 2 ms 

Wait, WTF is here? We again indicated only the same MAC address, why is the duplicate created?

The MERGE documentation indicates that "MERGE will not partially use existing models - its all or nothing, if partial matches are needed, this can be achieved by splitting the template into multiple MERGE clauses." So when we run this MERGE path, the whole path does not exist yet, it creates everything in it, including the duplicate MAC address of the node.

Often questions arise about duplicated nodes created by MERGE , and 99 times out of 100, this is what happens.

+5
source

Source: https://habr.com/ru/post/1203429/


All Articles