Is it good to use MySQL and Neo4j together?

I will create an application with a lot of similar elements (millions), and I would like to store them in the MySQL database, because I would like to do a lot of statistics and search on specific values ​​for specific columns.

But at the same time, I will keep the relationships between all the elements that are connected in many related binary-tree structures (transitive closure), and the relationship databases are not suitable for such structures, so I would like to store all the relationships in Neo4j that have good performance for this kind of data.

My plan is to have all the data except the relations in the MySQL database, and all relations with item_id stored in the Neo4j database. When I want to find a tree, I first search for Neo4j for all item_id : s in the tree, then I search the MySQL database for all the specified elements in the query, which will look like this:

SELECT * FROM items WHERE item_id = 45 OR item_id = 345435 OR item_id = 343 OR item_id = 78 OR item_id = 4522 OR item_id = 676 OR item_id = 443 OR item_id = 4255 OR item_id = 4345

Is this a good idea, or am I very wrong? I used to not use graphical databases. Are there any better approaches to my problem? How will the MySQL query be executed in this case?

+45
mysql architecture neo4j graph-databases hierarchical-data
Mar 29 '10 at 23:16
source share
4 answers

A few thoughts on this:

I would try modeling your Neo4j domain model to include the attributes of each node in the graph. By dividing the data into two different data warehouses, you can limit some operations that you might want to do.

I guess it comes down to what you will do with your schedule? If, for example, you want to find all nodes connected to a specific node whose attributes (i.e. name, age .. whatever) are specific values, you first need to find the correct node ID in the MySQL database, and then go to Neo4j. It just seems slow and overly complex when you can do it all in Neo4j. So the question is, do you need node attributes when going through the chart?

Will your data change or be static? Having two separate data warehouses will complicate the situation.

While generating statistics using a MySQL database may be simpler than running everything in Neo4j, the code needed to go through the graph to find all the nodes that meet certain criteria is not too complicated. That these statistics should drive your decision.

I can not comment on the performance of the MySQL query to select node ids. I think it comes down to how many nodes you need to choose and your indexing strategy. I agree with regard to performance when it comes to going through a schedule.

This is a good article just for this: MySQL vs. Neo4j on a large-scale graphical workaround , and in this case, when they say large, they only mean millions of vertices / nodes and four million edges. Thus, it was not even a particularly dense graph.

+25
Mar 29 '10 at 23:58
source share

Relational databases can handle graph structures. Some of them can even process them moderately elegantly (as elegantly as a relational database!).

The key to general graph processing in relational databases is the recursive common table expression (RCTE), which basically allows you to iteratively (not recursively, despite the name), expand the query on a rowset, combining the query that selects the root rowset and query, which defines the adjacent rows so far selected. The syntax is a bit clumsy, but it is general and powerful.

RCTEs are supported in PostgreSQL, Firebird, SQL Server, and apparently in DB2. Oracle has a different but equivalent construct; I read that the latest versions support the correct RCTE. MySQL does not support RCTE. If you're not tied to MySQL, I would strongly recommend that you use PostgreSQL, which basically is a much better database.

However, it seems you do not need to support general graphics, just trees. In this case, more specific options are available to you.

One of them is a classic, but rather a thinking nested set .

The simplest way is to save the path with each line: it is a line that represents the position of the line in the tree and has the property that the path for the node is the path prefix for any subnode, which allows very efficient execution of various requests for origin ("is node A a child of node B? "," What is node A and node B the lowest common ancestor? ", Etc.). For example, you can build a path for a string by traversing a tree from the root and appending the identifiers of the strings encountered in the path with a slash. It's easy to build, but takes care to save if you change the tree. Using the path column, you can restrict the query to this tree simply by adding and path like '23/%' , where 23 is the root identifier.

So, although a graph database is probably the best way to store and query graph data, this is not the only option, and I would suggest that you weigh the benefits of using one of the benefits of having all your data in one database.

+8
Aug 08 2018-12-12T00:
source share

I mainly use Binary Nerd, but I want to add an option. You can save the current data in Neo4j, and then extract the data needed for statistics / reporting and put it in MySQL. For the search, I would go with the integration of Neo4j-Lucene , if that suits your needs.

+5
Mar 30 '10 at 8:30
source share

You can improve the query using IN:

 SELECT * FROM items WHERE item_id IN (45, 345435, 343, 78, 4522, 676, 443, 4255, 4345) 

It is also not entirely true that relational databases do not properly maintain tree structures. Of course, MySQL lacks some functionality that will simplify its work, but most other databases support it well. Oracle has a CONNECT BY . Most core RDBMS have some form of recursive querying - MySQL is a notable exception. Perhaps you can take a look at PostgreSQL and see if this suits your needs?

+4
Mar 29 '10 at 23:29
source share



All Articles