How can I model the accuracy / reliability of data in a database?

Say I have a database timestamp. For each timestamp attribute, I could add an accuracy attribute by specifying a confidence interval, so the information stored can be, for example, “July 1, 2012 12:13, +/- 3 months”.

But in general, the accuracy and reliability of the recording is not so simple. The genealogy database may need to record the fact that a person may be the father of another person.

Are there any general principles or guidelines for storing information with different levels of accuracy / confidence?

+6
source share
1 answer

With your father’s example, it’s easy; it is impossible to be more than 100% sure that someone is the father of someone else; in general it is impossible to be more than 100% sure of anything! This, in turn, implies that for everything you can just keep the percentage level of confidence of any data attribute.

However, you may not want to keep the confidence level in percent; it depends on the data attribute itself and on the value of the data.

For example, if you want to keep how the “exact” particular line is compared with another, you might want to keep the Levenshtein distance . In your timestamp example, I personally saved the minimum and maximum values, although you could also keep the number of months that you added or did you read; or speed up the calculation when choosing from the database.

What I may be unclear about trying to write is that the answer to your question does not depend on the database, but on the data in it and the needs of your users, business, etc. Since it depends on the data, each of which an individual attribute or column needs an individual solution; there can be no "general" solution.

+4
source

Source: https://habr.com/ru/post/919423/


All Articles