Neo4J: How to Find Unique Nodes from a Set of Paths

Question

Neo4J: How to Find Unique Nodes from a Set of Paths

I use neo4j to solve the problem of normalization in real time. Let's say I have 3 places from 2 different sources. 1 source 45gives me 2 places that actually duplicate each other, and 1 source 55gives me 1 correct identifier. However, for any place ID (duplicate or not) I want to find the closest set of places that are unique by feed ID. My data looks like this:

CREATE (a: Place {feedId:45, placeId: 123, name:"Empire State", address: "350 5th Ave", city: "New York", state: "NY", zip: "10118" })
CREATE (b: Place {feedId:45, placeId: 456, name:"Empire State Building", address: "350 5th Ave", city: "New York", state: "NY"})
CREATE (c: Place {feedId:55, placeId: 789, name:"Empire State", address: "350 5th Ave", city: "New York", state: "NY", zip: "10118"})

I connected these nodes by matching nodes so that I can do some normalization of the data. For instance:

MERGE (m1: Matching:NameAndCity { attr: "EmpireStateBuildingNewYork", cost: 5.0 })
MERGE (a)-[:MATCHES]-(m1)
MERGE (b)-[:MATCHES]-(m1)
MERGE (c)-[:MATCHES]-(m1)
MERGE (m2: Matching:CityAndZip { attr: "NewYork10118", cost: 7.0 })
MERGE (a)-[:MATCHES]-(m2)
MERGE (c)-[:MATCHES]-(m2)

When I want to find what is the closest match from the starting place identifier, I can match all the paths from the beginning of the node, valued at a cost, that is:

MATCH p=(a:Place {placeId:789, feedId:55})-[*..4]-(d:Place)
WHERE NONE (n IN nodes(p)
        WHERE size(filter(x IN nodes(p)
                          WHERE n = x))> 1)
WITH    p,
        reduce(costAccum = 0, n in filter(n in nodes(p) where has(n.cost)) | costAccum+n.cost) AS costAccum
        order by costAccum
RETURN p, costAccum

, , node . , (, 45 55?

, ? ?

, !

+4

neo4j cypher

eurobrew 25 . '15 11:34

1

Tezra · Answer 1 · 2017-05-19T13:06:16+0000

d, ( , )

MATCH p=(a:Place {placeId:789, feedId:55})-[*..4]-(d:Place)
WITH d, collect(p) as paths,
        reduce(costAccum = 0, n in filter(n in nodes(p) where has(n.cost)) | costAccum+n.cost) AS costAccum
        order by costAccum
RETURN head(paths) as p, costAccum

Neo4J: How to Find Unique Nodes from a Set of Paths

More articles: