Recursive query with subgraph aggregation (arbitrary depth)

I asked a question earlier about combining values ​​along a graph. The two answers provided worked well, but now I'm trying to extend the Cypher request to a variable depth graph.

To summarize, we start with a handful of leaf stores that are all associated with a particular vendor, which is a property on the Store node. Then the inventory is transferred to other stores, and the proportion from each supplier corresponds to their contribution to the original store.

So, for node B02 , S2 contributed 750/1250 = 60% and S3 contributed 40% . Then we will move 600 units from our B02 , of which 60% belongs to S2 and 40% to S3 and so on.

enter image description here

What we want to know is what percentage of the final 700 units in D01 belongs to each supplier. If suppliers with the same name are the same supplier. So, for the above chart, we expect:

S1, 38.09
S2, 27.61 - S3, 34.28

I prepared a graph using this Cypher script:

 CREATE (A01:Store {Name: 'A01', Supplier: 'S1'}) CREATE (A02:Store {Name: 'A02', Supplier: 'S1'}) CREATE (A03:Store {Name: 'A03', Supplier: 'S2'}) CREATE (A04:Store {Name: 'A04', Supplier: 'S3'}) CREATE (A05:Store {Name: 'A05', Supplier: 'S1'}) CREATE (A06:Store {Name: 'A06', Supplier: 'S1'}) CREATE (A07:Store {Name: 'A07', Supplier: 'S2'}) CREATE (A08:Store {Name: 'A08', Supplier: 'S3'}) CREATE (B01:Store {Name: 'B01'}) CREATE (B02:Store {Name: 'B02'}) CREATE (B03:Store {Name: 'B03'}) CREATE (B04:Store {Name: 'B04'}) CREATE (C01:Store {Name: 'C01'}) CREATE (C02:Store {Name: 'C02'}) CREATE (D01:Store {Name: 'D01'}) CREATE (A01)-[:MOVE_TO {Quantity: 750}]->(B01) CREATE (A02)-[:MOVE_TO {Quantity: 500}]->(B01) CREATE (A03)-[:MOVE_TO {Quantity: 750}]->(B02) CREATE (A04)-[:MOVE_TO {Quantity: 500}]->(B02) CREATE (A05)-[:MOVE_TO {Quantity: 100}]->(B03) CREATE (A06)-[:MOVE_TO {Quantity: 200}]->(B03) CREATE (A07)-[:MOVE_TO {Quantity: 50}]->(B04) CREATE (A08)-[:MOVE_TO {Quantity: 450}]->(B04) CREATE (B01)-[:MOVE_TO {Quantity: 400}]->(C01) CREATE (B02)-[:MOVE_TO {Quantity: 600}]->(C01) CREATE (B03)-[:MOVE_TO {Quantity: 100}]->(C02) CREATE (B04)-[:MOVE_TO {Quantity: 200}]->(C02) CREATE (C01)-[:MOVE_TO {Quantity: 500}]->(D01) CREATE (C02)-[:MOVE_TO {Quantity: 200}]->(D01) 

The current request is as follows:

 MATCH (s:Store { Name:'D01' }) MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO]-(supp) WITH t.Quantity as total, collect(r) as movements WITH total, movements, reduce(totalSupplier = 0, r IN movements | totalSupplier + r.Quantity) as supCount UNWIND movements as movement RETURN startNode(movement).Supplier as Supplier, round(100.0*movement.Quantity/supCount) as pct 

I am trying to use a recursive relationship, something like this:

 MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO*]-(supp) 

however, which gives several paths to the end of the node, and I need to aggregate the inventory in each node, I think.

+6
source share
3 answers

This query generates the correct results for any arbitrary graph that matches the model described in the question. (When Store x moves goods to Store y, it is assumed that the Supplier interest rates for the moved goods are the same as Store x.)

However, this solution does not only consist of a single Cypher request (as this may not be possible). Instead, it includes several queries, one of which must be repeated until the calculations are cascaded across the entire graph of the Store nodes. This repeat request will clearly tell you when to stop the iteration. Other Cypher requests are necessary to: prepare the schedule for iteration, report the Supplier’s interest for the “end” node and clear the schedule (so that it is restored as it was before step 1, below).

Perhaps these queries will be optimized.

Here are the required steps:

  • Prepare a graph for an iterative query (initializes a temporary pcts array for all starting Store nodes). This includes creating a singleton Suppliers node, which has an array with all the names of the providers. This is used to determine the order of the elements of the pcts temporary arrays and to match these elements with the correct provider name.

     MATCH (store:Store) WHERE HAS (store.Supplier) WITH COLLECT(store) AS stores, COLLECT(DISTINCT store.Supplier) AS csup CREATE (sups:Suppliers { names: csup }) WITH stores, sups UNWIND stores AS store SET store.pcts = EXTRACT(i IN RANGE(0,LENGTH(sups.names)-1,1) | CASE WHEN store.Supplier = sups.names[i] THEN 1.0 ELSE 0.0 END) RETURN store.Name, store.Supplier, store.pcts; 

    Here is the result with the question data:

     +---------------------------------------------+ | store.Name | store.Supplier | store.pcts | +---------------------------------------------+ | "A01" | "S1" | [1.0,0.0,0.0] | | "A02" | "S1" | [1.0,0.0,0.0] | | "A03" | "S2" | [0.0,1.0,0.0] | | "A04" | "S3" | [0.0,0.0,1.0] | | "A05" | "S1" | [1.0,0.0,0.0] | | "A06" | "S1" | [1.0,0.0,0.0] | | "A07" | "S2" | [0.0,1.0,0.0] | | "A08" | "S3" | [0.0,0.0,1.0] | +---------------------------------------------+ 8 rows 83 ms Nodes created: 1 Properties set: 9 
  • Iterative query (repeated until 0 rows are returned)

     MATCH p=(s1:Store)-[m:MOVE_TO]->(s2:Store) WHERE HAS(s1.pcts) AND NOT HAS(s2.pcts) SET s2.pcts = EXTRACT(i IN RANGE(1,LENGTH(s1.pcts),1) | 0) WITH s2, COLLECT(p) AS ps WITH s2, ps, REDUCE(s=0, p IN ps | s + HEAD(RELATIONSHIPS(p)).Quantity) AS total FOREACH(p IN ps | SET HEAD(RELATIONSHIPS(p)).pcts = EXTRACT(parentPct IN HEAD(NODES(p)).pcts | parentPct * HEAD(RELATIONSHIPS(p)).Quantity / total) ) FOREACH(p IN ps | SET s2.pcts = EXTRACT(i IN RANGE(0,LENGTH(s2.pcts)-1,1) | s2.pcts[i] + HEAD(RELATIONSHIPS(p)).pcts[i]) ) RETURN s2.Name, s2.pcts, total, EXTRACT(p IN ps | HEAD(RELATIONSHIPS(p)).pcts) AS rel_pcts; 

    Result Iteration 1:

     +-----------------------------------------------------------------------------------------------+ | s2.Name | s2.pcts | total | rel_pcts | +-----------------------------------------------------------------------------------------------+ | "B04" | [0.0,0.1,0.9] | 500 | [[0.0,0.1,0.0],[0.0,0.0,0.9]] | | "B01" | [1.0,0.0,0.0] | 1250 | [[0.6,0.0,0.0],[0.4,0.0,0.0]] | | "B03" | [1.0,0.0,0.0] | 300 | [[0.3333333333333333,0.0,0.0],[0.6666666666666666,0.0,0.0]] | | "B02" | [0.0,0.6,0.4] | 1250 | [[0.0,0.6,0.0],[0.0,0.0,0.4]] | +-----------------------------------------------------------------------------------------------+ 4 rows 288 ms Properties set: 24 

    Result of iteration 2:

     +-------------------------------------------------------------------------------------------------------------------------------+ | s2.Name | s2.pcts | total | rel_pcts | +-------------------------------------------------------------------------------------------------------------------------------+ | "C02" | [0.3333333333333333,0.06666666666666667,0.6] | 300 | [[0.3333333333333333,0.0,0.0],[0.0,0.06666666666666667,0.6]] | | "C01" | [0.4,0.36,0.24] | 1000 | [[0.4,0.0,0.0],[0.0,0.36,0.24]] | +-------------------------------------------------------------------------------------------------------------------------------+ 2 rows 193 ms Properties set: 12 

    Result of iteration 3:

     +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | s2.Name | s2.pcts | total | rel_pcts | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "D01" | [0.38095238095238093,0.27619047619047615,0.34285714285714286] | 700 | [[0.2857142857142857,0.2571428571428571,0.17142857142857143],[0.09523809523809522,0.01904761904761905,0.17142857142857143]] | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row 40 ms Properties set: 6 

    Result of iteration 4:

     +--------------------------------------+ | s2.Name | s2.pcts | total | rel_pcts | +--------------------------------------+ +--------------------------------------+ 0 rows 69 ms 
  • List non-zero Supplier percentages to end with Store node (s).

     MATCH (store:Store), (sups:Suppliers) WHERE NOT (store:Store)-[:MOVE_TO]->(:Store) AND HAS(store.pcts) RETURN store.Name, [i IN RANGE(0,LENGTH(sups.names)-1,1) WHERE store.pcts[i] > 0 | {supplier: sups.names[i], pct: store.pcts[i] * 100}] AS pcts; 

    Result:

     +----------------------------------------------------------------------------------------------------------------------------------+ | store.Name | pcts | +----------------------------------------------------------------------------------------------------------------------------------+ | "D01" | [{supplier=S1, pct=38.095238095238095},{supplier=S2, pct=27.619047619047617},{supplier=S3, pct=34.285714285714285}] | +----------------------------------------------------------------------------------------------------------------------------------+ 1 row 293 ms 
  • Clean (remove all temporary details pcts and Suppliers node).

     MATCH (s:Store), (sups:Suppliers) OPTIONAL MATCH (s)-[m:MOVE_TO]-() REMOVE m.pcts, s.pcts DELETE sups; 

    Result:

     0 rows 203 ms +-------------------+ | No data returned. | +-------------------+ Properties set: 29 Nodes deleted: 1 
+2
source

As I said, I liked this question. I know that you already accepted the answer, but I decided to post my last answer, as it also returns percentile without the efforts of the client (this means that you can also SET on the nodes to update the value in db when you need to) and of course if for some other reason, as one, I can return to :) here is a link to a console example

it returns a string with the name of the store, the amount transferred to it from all suppliers and the percentile of each supplier

 MATCH p =s<-[:MOVE_TO*]-sup WHERE HAS (sup.Supplier) AND NOT HAS (s.Supplier) WITH s,sup,reduce(totalSupplier = 0, r IN relationships(p)| totalSupplier + r.Quantity) AS TotalAmountMoved WITH sum(TotalAmountMoved) AS sumMoved, collect(DISTINCT ([sup.Supplier, TotalAmountMoved])) AS MyDataPart1,s WITH reduce(b=[], c IN MyDataPart1| b +[{ Supplier: c[0], Quantity: c[1], Percentile: ((c[1]*1.00))/(sumMoved*1.00)*100.00 }]) AS MyData, s, sumMoved RETURN s.Name, sumMoved, MyData 
+3
source

I cannot imagine my way through a solution in a pure cipher, because I do not think you can do recursion like this in cypher. You can use cypher to return all the data in the tree in a simple way so you can compute it in your favorite programming language. Something like that:

 MATCH path=(source:Store)-[move:MOVE_TO*]->(target:Store {Name: 'D01'}) WHERE source.Supplier IS NOT NULL RETURN source.Supplier, reduce(a=[], move IN relationships(path)| a + [{id: ID(move), Quantity: move.Quantity}]) 

This will return you an identifier and quantity for each of the relationships along each path. Then you can process this client side (maybe convert it to a nested data structure first?)

+2
source

Source: https://habr.com/ru/post/981328/


All Articles