Creating a metabolic pathway in Neo4j

I am trying to create the glycolytic path shown in the image at the bottom of this question in Neo4j using this data:

glycolysis_bioentities.csv

name α-D-glucose glucose 6-phosphate fructose 6-phosphate "fructose 1,6-bisphosphate" dihydroxyacetone phosphate D-glyceraldehyde 3-phosphate "1,3-bisphosphoglycerate" 3-phosphoglycerate 2-phosphoglycerate phosphoenolpyruvate pyruvate hexokinase glucose-6-phosphatase phosphoglucose isomerase phosphofructokinase "fructose-bisphosphate aldolase, class I" triosephosphate isomerase (TIM) glyceraldehyde-3-phosphate dehydrogenase phosphoglycerate kinase phosphoglycerate mutase enolase pyruvate kinase 

glycolysis_relations.csv

 source,relation,target α-D-glucose,substrate_of,hexokinase hexokinase,yields,glucose 6-phosphate glucose 6-phosphate,substrate_of,glucose-6-phosphatase glucose-6-phosphatase,yields,α-D-glucose glucose 6-phosphate,substrate_of,phosphoglucose isomerase phosphoglucose isomerase,yields,fructose 6-phosphate fructose 6-phosphate,substrate_of,phosphofructokinase phosphofructokinase,yields,"fructose 1,6-bisphosphate" "fructose 1,6-bisphosphate",substrate_of,"fructose-bisphosphate aldolase, class I" "fructose-bisphosphate aldolase, class I",yields,D-glyceraldehyde 3-phosphate D-glyceraldehyde 3-phosphate,substrate_of,glyceraldehyde-3-phosphate dehydrogenase D-glyceraldehyde 3-phosphate,substrate_of,triosephosphate isomerase (TIM) triosephosphate isomerase (TIM),yields,dihydroxyacetone phosphate glyceraldehyde-3-phosphate dehydrogenase,yields,"1,3-bisphosphoglycerate" "1,3-bisphosphoglycerate",substrate_of,phosphoglycerate kinase phosphoglycerate kinase,yields,3-phosphoglycerate 3-phosphoglycerate,substrate_of,phosphoglycerate mutase phosphoglycerate mutase,yields,2-phosphoglycerate 2-phosphoglycerate,substrate_of,enolase enolase,yields,phosphoenolpyruvate phosphoenolpyruvate,substrate_of,pyruvate kinase pyruvate kinase,yields,pyruvate 

This is what I still have

enter image description here

... using this encryption code (passed to Cycli or cypher-shell ):

 LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row MERGE (s:Glycolysis {source: row.source}) MERGE (r:Glycolysis {relation: row.relation}) MERGE (t:Glycolysis {target: row.target}) FOREACH (x in case row.relation when "substrate_of" then [1] else [] end | MERGE (s)-[r:substrate_of]->(t) ) FOREACH (x in case row.relation when "yields" then [1] else [] end | MERGE (s)-[r:yields]->(t) ); 

I would like to create a fully connected path with labels on all nodes. Suggestions?

enter image description here

+5
source share
2 answers

[UPDATED]

There are several problems and possible improvements:

  • The second MERGE must be removed as it creates orphaned nodes. The relationship type should not be configured on the Glycolysis node, and such nodes will never be connected to any other nodes.
  • The sentences of the 1st and 3rd MERGE must use the same property name (for example, name ) for the source and target nodes, otherwise the same chemical can have 2 nodes (with different property keys). That's why you ended up with nodes that did not have all the expected connections.
  • The APOC apoc.cypher.doIt procedure can be used to simplify MERGE relationships with dynamic names.
  • For this case, the use of glycolysis_bioentities.csv not required.

With the above changes, you will get something like this that will generate a linked graph that matches your input:

 LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row MERGE (s:Glycolysis {name: row.source}) MERGE (t:Glycolysis {name: row.target}) WITH s, t, row CALL apoc.cypher.doIt( 'MERGE (s)-[r:' + row.relation + ']->(t)', {s:s, t:t}) YIELD value RETURN 1; 
+3
source

@cybersam's answer is excellent, providing the most elegant solution (again: thanks!) - please support the accepted answer.

Since this question / answer / topic is likely to be of interest to others, I would like to mention that my code (based on this SO stream, How to indicate the relationship type in CSV? And changed according to the prompts provided by @cybersam), and shows the result:

 LOAD CSV WITH HEADERS FROM "file:/glycolysis_relations.csv" AS row MERGE (s:Glycolysis {name:row.source}) MERGE (t:Glycolysis {name:row.target}) FOREACH (x in case row.relation when "substrate_of" then [1] else [] end | MERGE (s)-[r:substrate_of]->(t) ) FOREACH (x in case row.relation when "yields" then [1] else [] end | MERGE (s)-[r:yields]->(t) ); 

Both solutions generate the identical graph below.: - D

neo4j_glycolytc_pathway

+3
source

Source: https://habr.com/ru/post/1276274/


All Articles