Creating a metabolic pathway in Neo4j

Question

Creating a metabolic pathway in Neo4j

I am trying to create the glycolytic path shown in the image at the bottom of this question in Neo4j using this data:

glycolysis_bioentities.csv

name α-D-glucose glucose 6-phosphate fructose 6-phosphate "fructose 1,6-bisphosphate" dihydroxyacetone phosphate D-glyceraldehyde 3-phosphate "1,3-bisphosphoglycerate" 3-phosphoglycerate 2-phosphoglycerate phosphoenolpyruvate pyruvate hexokinase glucose-6-phosphatase phosphoglucose isomerase phosphofructokinase "fructose-bisphosphate aldolase, class I" triosephosphate isomerase (TIM) glyceraldehyde-3-phosphate dehydrogenase phosphoglycerate kinase phosphoglycerate mutase enolase pyruvate kinase

glycolysis_relations.csv

 source,relation,target α-D-glucose,substrate_of,hexokinase hexokinase,yields,glucose 6-phosphate glucose 6-phosphate,substrate_of,glucose-6-phosphatase glucose-6-phosphatase,yields,α-D-glucose glucose 6-phosphate,substrate_of,phosphoglucose isomerase phosphoglucose isomerase,yields,fructose 6-phosphate fructose 6-phosphate,substrate_of,phosphofructokinase phosphofructokinase,yields,"fructose 1,6-bisphosphate" "fructose 1,6-bisphosphate",substrate_of,"fructose-bisphosphate aldolase, class I" "fructose-bisphosphate aldolase, class I",yields,D-glyceraldehyde 3-phosphate D-glyceraldehyde 3-phosphate,substrate_of,glyceraldehyde-3-phosphate dehydrogenase D-glyceraldehyde 3-phosphate,substrate_of,triosephosphate isomerase (TIM) triosephosphate isomerase (TIM),yields,dihydroxyacetone phosphate glyceraldehyde-3-phosphate dehydrogenase,yields,"1,3-bisphosphoglycerate" "1,3-bisphosphoglycerate",substrate_of,phosphoglycerate kinase phosphoglycerate kinase,yields,3-phosphoglycerate 3-phosphoglycerate,substrate_of,phosphoglycerate mutase phosphoglycerate mutase,yields,2-phosphoglycerate 2-phosphoglycerate,substrate_of,enolase enolase,yields,phosphoenolpyruvate phosphoenolpyruvate,substrate_of,pyruvate kinase pyruvate kinase,yields,pyruvate

This is what I still have

... using this encryption code (passed to Cycli or cypher-shell ):

 LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row MERGE (s:Glycolysis {source: row.source}) MERGE (r:Glycolysis {relation: row.relation}) MERGE (t:Glycolysis {target: row.target}) FOREACH (x in case row.relation when "substrate_of" then [1] else [] end | MERGE (s)-[r:substrate_of]->(t) ) FOREACH (x in case row.relation when "yields" then [1] else [] end | MERGE (s)-[r:yields]->(t) );

I would like to create a fully connected path with labels on all nodes. Suggestions?

+5

neo4j cypher

Victoria stuart Apr 05 '18 at 22:13

source share

2 answers

@cybersam's answer is excellent, providing the most elegant solution (again: thanks!) - please support the accepted answer.

Since this question / answer / topic is likely to be of interest to others, I would like to mention that my code (based on this SO stream, How to indicate the relationship type in CSV? And changed according to the prompts provided by @cybersam), and shows the result:

 LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row MERGE (s:Glycolysis {name:row.source}) MERGE (t:Glycolysis {name:row.target}) FOREACH (x in case row.relation when "substrate_of" then [1] else [] end | MERGE (s)-[r:substrate_of]->(t) ) FOREACH (x in case row.relation when "yields" then [1] else [] end | MERGE (s)-[r:yields]->(t) );

Both solutions generate the identical graph below.: - D

+3

Victoria stuart Apr 6 '18 at 2:44

source share

cybersam · Accepted Answer · 2018-04-06T00:34:36+0000

[UPDATED]

There are several problems and possible improvements:

The second MERGE must be removed as it creates orphaned nodes. The relationship type should not be configured on the Glycolysis node, and such nodes will never be connected to any other nodes.
The sentences of the 1st and 3rd MERGE must use the same property name (for example, name ) for the source and target nodes, otherwise the same chemical can have 2 nodes (with different property keys). That's why you ended up with nodes that did not have all the expected connections.
The APOC apoc.cypher.doIt procedure can be used to simplify MERGE relationships with dynamic names.
For this case, the use of glycolysis_bioentities.csv not required.

With the above changes, you will get something like this that will generate a linked graph that matches your input:

 LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row MERGE (s:Glycolysis {name: row.source}) MERGE (t:Glycolysis {name: row.target}) WITH s, t, row CALL apoc.cypher.doIt( 'MERGE (s)-[r:' + row.relation + ']->(t)', {s:s, t:t}) YIELD value RETURN 1;

Creating a metabolic pathway in Neo4j

More articles: