Modeling metadata about mathematical computation in Neo4j

I am new to the forum and just getting started with Neo4J. I apologize for my long question and background information, but I think this helps explain what I'm trying to understand.

I often work on Business Intelligence and Data Warehouse projects for companies. When we create requirements for business analytics, we usually need to create a list of business indicators of interest to us (such as Sales Revenue, Profit Ratio, Total Expenses) and document how these business indicators are calculated using data attributes from our base systems. Usually we document most of this work in Excel in the form of tables of data requirements. We create a list of business indicators, and then a stack of columns with a description, attributes of the source data, calculations, etc. What I'm trying to do (as a personal project) is to develop an application that we can use to document this type of metadata. I have read several Neo4j books and online articles, and I think Neo4j is a good fit for this use case, and now I'm trying to document a basic data model to help me get started.

At first I came up with something pretty straightforward, as shown in the image at the bottom left, starting from the point that:

Sales Revenue = Unit_Price * Count_Units_Sold

First attempt to model metrics and attributes

However, I quickly realized that the calculation itself is very important to me, and that I, perhaps, at a later stage want to get more information about it, for example, add different versions of the calculation or add notes for further description. I modified the model to make self-calculation a separate node according to the image on the left above.

However, when I start looking at more complex metrics, I'm still not sure how best to present the details of the calculation. If I take the example below, I would model it as follows.

Salary = Salary_Amount + Overtime_Amount - Tax Amount

More complex example

Now this clearly reflects the data attributes (3 or their) that are used in the calculation, but I do not know how to represent the calculation itself. For instance. to determine that the calculation is done by adding Salary_Amount to Overtime_Amount, and then subtracting Tax_Amount. When I have a more complicated calculation, including division and multiplication, which must be performed in a certain order, it will be even more complicated. Essentially, I want to be able to deduce from the model that the calculation is as follows:

Salary = Salary_Amount + Overtime_Amount - Tax Amount

Unlike:

Salary = Salary_Amount * Tax Amount / Overtime Amount

Or:

Salary = Tax Amount * Overtime Amount - Earnings

I am looking for a way to define a node calculation in which I can apply ordering to a way to use data attributes. Maybe I just need to store the calculations as a text string in the calculation property, but I canโ€™t help, but I think it can cause me pain in the road and limit my ability to get utility information from the graph when several data attributes are used in different calculations .

Note. I saw this question on the forum, which is related to a similar topic, but did not receive many answers, even if my question is similar to me, although providing additional information may lead to further understanding.

Thanks a lot michael


I am editing this question after considering the answers @ChristopheWillemsen and @ stdob--.

Firstly, thanks to many participants. The answers and reference materials were really helpful, and both of them met my requirements. Initially, I was inclined to use reverse Polish notation in accordance with the answer from @stdob, because it offered a neat way to handle grouped operations (for example, parentheses in my mathematical formulas). However, trying to simulate my data in both directions, I found that I have additional requirements that I did not consider in my first post, which should write logical expressions such as "If, Where, Have". Basically, I want to be able to capture ETL type conversion rules that go beyond purely mathematical expressions, and I think the @ChristopheWillemsen solution will support this.

Here's how I modeled the basic formulas using this approach:

Base Calc after Method 1

However, I also have more complex logic that I want to simulate. These are rules such as ETL, which are usually committed as pseudo-code or in SQL form when defining business requirements for a data warehouse or BI project. The following is an example where I define the logic of how the ETL can calculate the number of new claims for an insurance company.

New claim count

This is how I modeled this, extending to the solution that @ChristopheWillemsen provided in the first answer below.

Modeling New Claims

Could you take a look at this and see if this is suitable for modeling this. In terms of requirements, I want to be able to:

  • Restore logic so I can present it to end users
  • Answer questions such as metrics for which this attribute is needed.
  • Perform an if-if analysis (for example, if the value of an attribute changes what affects the metrics that use this attribute.

Does this sound like a suitable approach for modeling this type of information? Any suggestions or improvements would be welcome?

+6
source share
2 answers

This is a very interesting precedent, and for me it is approaching what we call "Rules Motors".

I placed a usage example in the neo4j block: https://neo4j.com/blog/uncommon-use-cases-graph-databases/

Of course, there are several ways to achieve what you want, and I will share one of my thoughts.

I would consider computing as an ordered list of Operations that defines their nature. For example, you will have an Operation node with an additional label Addition , and the next operation may be an Operation node with a label Substraction .

A simple model can be represented as follows:

enter image description here

Then your Operation nodes will refer to the input value that they use.

In a more complex situation, you would like to imagine a group of operations that can determine the mathematical grouping between parentheses, again the model can be performed as follows:

enter image description here

The possibilities are almost endless.

Please note that in computer science this method is also known as a specification template: https://www.martinfowler.com/apsupp/spec.pdf

+6
source

The first option is to write the expression in Reverse Polish Notation and save it in an ordered tree:

 Salary_Amount * Tax_Amount / Overtime_Amount => Salary_Amount Tax_Amount * Overtime_Amount / 

enter image description here


The second option that comes to mind: save the formula as text and send the formula and parameter value in any scripting language to run. For example - in javascript eval .


I also recommend reading this article: Tables don't match the graphs either


Update: The idea of โ€‹โ€‹using cypher and apoc-library to calculate formulas:

 WITH "{Salary_Amount} * {Tax_Amount} / {Overtime_Amount}" as Formula CALL apoc.cypher.run("return " + Formula + " as value", { Salary_Amount: 1000, Tax_Amount: 0.49, Overtime_Amount: 100 }) yield value as result RETURN result.value 
+4
source

Source: https://habr.com/ru/post/1014676/


All Articles