In Neo4j, what level of specificity should be used when the level of detail can be unlimited?

The most difficult thing to wrap around when using a graph database is choosing the level of detail. Suppose I have a schedule for things that happen on certain days of the week: garbage day, taco tuesday, BYOB friday, etc.

  • I can make node (Mon, Tue, Wed, ...) every day, so requests for specific days are fast.
  • I can make a node called "Day" and add the property name to the day of the week. Thus, displaying all days on a chart is easy for querying.

Thinking to yourself that the nodes are very specific is bad because there are no restrictions on the granularity. For example, Saturday morning, evening and night, or, even worse, a new node at the hour of each day. I could also make faces a drillthrough component by stating that Saturday node is connected to the "evening" edge before the day of the node garbage.

Sometimes I encounter similar problems, for example; should I create a new node based on the fully qualified name of the person, or a node called "Person" with the name of the property. Then I make the nodes both specific and general based on convenience, but I feel that there might be some kind of best practice or a higher level principle that I am missing. I don’t understand how best to judge which way is better.

+5
source share
2 answers

The level of detail of your data model should be determined by your query requirements, and not vice versa. That is: when modeling a database, you should ask yourself: "What query will I make based on my data?". Based on the answers to this question, you will get a good starting point to create a good model with an appropriate level of detail.

In the Learning Neo4j book by Rick Van Bruggen (you can download this link ), the author talks about creating graphical databases for querying:

Like any database management system, but perhaps even more graph database management system, such as Neo4j, your queries will manage your model. What we have in mind is that with any database that you may have used in the past, or still use today, you will need to make specific design decisions based on specific compromises. It follows that there is no one great way to model a graph database , such as Neo4j. It will all depend on the questions you want to ask the data, and this will lead to your design and your model.

So, based on this, the answer to your question is , what level of specificity should be used when the level of detail can be unlimited? : it depends on your requests. Think about the queries you will make first, and then the data model.

My suggestion: keep your model as simple as possible at the beginning and, if necessary, make gradual changes.

+4
source

First, I recommend that you think about what you want to do with your data . You do not use a graphical database for simple data storage, you also want to do something with it. Thus, you probably have a specific use case, such as finding a path. In this case, there are not many options, but there is another way to model the data. In this case, I would consider the algorithms already provided and whether they could process what I want to do with it. So let's say that I want to use apoc.algo.aStar, because it is capable of doing what I want to do. At this stage, I restrict myself to the fact that aStar is able to process weights only by relations, and the algorithm wants to have coordinates on the nodes. This is probably also the first scheme you were thinking about, but I think you understood it. If there is no algorithm for your problem, you will make the algorithm yourself. Take a look at the options you have, and you'll often be limited to a specific way to model your data.

As already mentioned, the way you process your data also affects how quickly you can request certain things. For example, you are modeling a map, so you have point A and point B at which you want to go from A to B and B to A. The problem in neo4j is that you do not have a bi-directional edge. So you might consider adding 2 edges, from A to B and from B to A. Don't do this! Performance will be hit hard.

  • I can make node (Mon, Tue, Wed, ...) every day, so requests for specific days are fast.
  • I can make a node called "Day" and add the property name to the day of the week. Thus, showing all the days on the chart, it is easy to request for.

Ask yourself why you have this database and what you want to do with it, and don't forget about indexing. You can still create an index to get some performance back that you still had in the first example. Also avoid adding redundant data. For example, node is a day connected to all business days. Everyone knows that Friday is a day. Just think about it if you benefit from it. After modeling several graphs, as well as writing your own algorithms for graphs, you will feel it. At some point, you will know how best to create graphics for specific cases. Experience is the key to charting, knowing the limitations of the algorithms that you can already use, and what you can do yourself. Hope this helps.

+2
source

Source: https://habr.com/ru/post/1274981/


All Articles