One table or many for many different but interacting events?

I am creating an application that, by its main function, tracks various data over time (blood glucose levels, insulin doses, food intake, etc.), and I'm trying to decide how best to organize this information in a database.

At its most basic, everything in this particular umbrella is an event, so I thought about the fact that I have one event table with fields for all the properties that may occur. This can be cumbersome, though, since the vast majority of fields eventually become empty for many of them; but I'm not sure if this is really a problem. The advantage of this method is that it will be easier to trigger and display all events. But since many of the events will only have a timestamp, I doubt if they belong to the same table.

I'm not sure it makes sense to have a table for each kind of event, because, taken separately, most events have only one property other than timestamp, and they often have to mix. (many data types are often, but not always included in the group)

some types of events have a duration. some of them are relatively rare. One class of events, as a rule, is a bet that remains unchanged if the speed does not change forever or with a temporary redefinition (these are the ones I am most worried about). Some of them are simple binary tags (for which I planned to have a link table, but in order to make it easy, I will need / prefer a common event_id to associate them with.

My tendency is that it is best to have several tables with closely related types of information, and not with one table with all and a lot of space. But I'm not quite sure how to proceed.

I would like some advice on strategies to determine the best approach in such a situation.

edit: Here's a summary of the data types I'm dealing with, in case this makes things clearer.

events: -blood glucose timestamp value (tagged w/: from pump, manually entered [pre-meal, post-meal (breakfast, lunch, dinner) before bed, fasting, hypo, high, hyper - which will be either manually entered or inferred based on settings or other user entries], before/after exercise etc i imagine would be better off dynamically generated with queries as necessary. though could apply same paradigm to the meals? -sensor glucose (must be separate bc it is not as reliable so will be different number from regular bg test, also unlikely to be used by majority of users.) timestamp amount -bolus (timestamp) bolus total food total correction total active insulin** bolus type - normal[vast majority] square wave or dual wave -food (timestamp) carb amount carb type (by weight or exchanges) <- this could probably be in user settings table food-description carb-estimated (binary) meal? - or separate table. (accompanying bolus id? though that seems to finicky) -meals timestamp mealname (breakfast, lunch, supper) (or mealnames table? seems excessive?) -basal timestamp rate per hour rate changes throughout day on regular pattern, so either automatically fill in from 'last activated pattern' (in the form midnight: 0.7/hr, 7am: 0.9/hr, 12pm: 0.8/hr etc) create new pattern whenever one is used -temp basal (regular basal pattern can be overridden with temporary basal) temp basal start ?temp basal end and/or temp basal duration temp basal amount temp basal type -> either in % or specific rate. -exercise start-time end-time intensity ?description (unless 'notes' is universal for any event) -pump rewind (every 3 days or so) -time -pump prime -amount -type (fixed or manual) -pump suspended start-time end-time -keytones time result -starred event -flagged event -notes timestamp (user can place a note with any event to provide details or comments, but might want a note where there is no data as well.) (i want a way for users to flag specific events to indicate they are result of error or otherwise suspect, and to star events as noteworthy either to discuss with doctor or to look at later) **only place I get active insulin from is when a bolus is entered, but it could be useful other times as a constantly tracked variable, which could be calculated by looking at boluses delivered up to X time ago where X is the Active Insulin Time. other infrequent events (likely 2-10 per year): -HbA1C time value -weight time value units -cholesterol time value -blood pressure time value -pump settings (will need to track settings changes, but should be able to do that with queries) -timestamp -bg-target -active insulin time -carb ratios (changes throughout day like basal) -sensitivity -active insulin time 

Problems. 1) a comprehensive table of "events" with a type to quickly return all events over a specific period of time without having to query each table? (The disadvantage is how can I work with events with a duration? Is there extra time in the event table?)

2) this is a local database, which, as a rule, will be one user, and there will never be a need to compare or interact with any of the records of other users if it is synchronized online, so I thought about just saving one version of the database for each user, although it is possible to add a user ID when loading it.

3) many of the events are often combined for ease of interpretation and analysis (blood sugar, food, food, bolus, notes, for example), I'm going to do it better after the fact with queries, and not with hard coding to maintain integrity.

Some information about what the database will be used for: Visual representation of all data types during the day - Exclude all test results and the percentage of insulin that is used for nutrition, correction, basal. -As well as specific extended queries, such as: a list of up to 20 examples of the difference in glucose level between the dose of glucose and morning glucose, when there is no food and there are no exercises without 2 hours, since the settings were changed for the last time, etc. -program will automatically assign tags based on parameters. as if> 20 carbohydrates were eaten during the appointed “lunch”, he would say that the food is lunch. if there are two meals within 30 minutes (or the preference for "meal length"), they group them as one dish. Not completely sure how this will work right now.

+4
source share
4 answers

V1.0

Relational databases and SQL (which were designed for them) work much better when data is organized and normalized. One large table is abnormal and crippled in terms of performance and relational power.

Your requirement requires a regular cluster of supertype-subtype tables. Unfortunately, ordinary relational structures like this are not "common."

  • The symbol of the standard subtype is a semicircle.

    • Power Supertype :: Subtype is always 1 :: 0-to-1.

    • The primary key of a subtype is the primary key of a supertype. It is also an external key to the supertype.

  • There are two types:

    • Exclusive, where there is only one subtype for each line of the supertype, denoted by X through a semicircle.

    • Non-exclusive where there is more than one subtype in the supertype string

  • Your Exclusive. This type of discriminator is required to determine which subtype is active for the supertype string. If the number of subtypes is small, indicators can be used; otherwise a classification table is required.

  • Please note that all this, the structures, rules, restrictions that are necessary to support it, and to ensure data integrity, are available in the standard IEC / ISO / ANSI SQL standard. (Non-SQL queries do not meet SQL requirements).

Data

  • Naming is very important. We are advised to indicate the table by row, and not by content, value or action. You are talking about events, but I can only see Readings.

  • There should be a context for these Readings or events. I do not see EventId hanging in the air. I suggested that the Readings are dedicated to a particular patient. Please inform and I will change the model.

  • Composite or complex keys are normal. SQL is quite capable (non-SQL files are not). PatientId already exists as FK in Reading , and it is used to form its PK. There is no need for an additional ReadingId column and an additional index , which will be 100% redundant.

  • SQL is also quite capable of processing many tables (the database I currently work in exceeds 500 tables), and a large number of smaller tables are the nature of relational databases.

  • This is the pure fifth normal form (columns are not duplicated, there are no update anomalies).

    • This can be normalized to the sixth normal form, and thus additional benefits can be obtained; and 6NF can be optimized, etc .; but all that is not required here.

    • Some tables end up in 6NF, but this is a consequence, not an intention, so it cannot be declared as such.
      ,

  • If you provide information about limitations and overrides that bother you, I can provide a model that solves these problems.

  • Since modeling , it is already configured for very fast comparisons (generating alarms, etc.).

▶ Reading data model ◀

Readers who are not familiar with the standard for modeling relational databases can find ▶ IDEF1X Notational ◀ .

Feel free to ask clarifying questions, either in the form of comments, or as Edit your question.

Caveat

  • Crowds of OO and ORM (leading Fowler and Ambler) are unaware of relational technology and Databases. Designing objects is significantly different from modeling data. If you apply your object design to databases, you will get monsters that need to be “re-factorized,” and you will have to buy another “book” that will show you how to do this effectively. Meanwhile, the "database" is crippled.

  • Relational databases that are correctly modeled (as data, not objects) never need to be "re-factorized." In heavily standardized databases, you can add tables, columns, and functions without modifying existing data or code.

  • Even the ORM concept is completely corrupted. Data has more persistence than objects. If you first model the data, then model the objects for the data, they are very stable. But if you first model your objects (oddly enough, without understanding the data), then model the data after the objects, you will go back and forth, constantly correct both.

  • Relational databases have completely normal structures such as Supertype-Subtype for over 30 years, and they work well if they are implemented as such. They are not "gen-spec" or "class-inheritance" or any such OO thing; and if these OO or ORM structures are implemented without proper data modeling, the "database" will be crippled and we-factoring will be needed.

    • In addition, they do not implement the required data integrity constraints; therefore, data quality is usually low. We do not allow bad data to enter the database; their “databases” are full of bad data, and they need another “book” on how to clean dirty data.
      ,
  • They have a sequence and the hierarchy is mixed up. Done correctly, there is no “impedance mismatch”, no pseudo-technical names to mask pure stupidity; to justify repeating the same set of jobs over and over.

So run like hell from anyone using OO or ORM terminology when working with relational databases.

V1.1

Your Editing provides much more detailed information, which, of course, is required because the context, the whole, is necessary if the data must be correctly modeled. This includes all of this information. However, questions remain, and some of them will be required before they can be completed. Feel free to ask questions about anything that is not entirely clear; I don’t know exactly what a gap is until I throw it away and you talk to him.

▶ Event data model V1.1 ◀

  • All my models are clean. Relational (retain full relational power), IDEF1X and the fifth normal form (without update anomalies). All rules (business or data / referential integrity) that are drawn in the model can be implemented as a declaration in ISO / IEC / ANSI SQL.

  • Never do anything. My models do not require this, and any code that works with the database should not. All fixed text is normalized to lookup or lookup tables. (this bit is incomplete, you need to fill in the blanks).

    • A short alphabetical code is much better than an enumeration; as soon as you get used to it, the meanings and meanings become immediately recognizable.

    • Since they are PK and therefore stable, you can safely code:

      ... WHERE EventTypeCode = "P"
      or
      ... WHERE EventTypeCode LIKE "T%"

  • I believe that DataTypes are self-evident or can be easily developed. If not, please ask.

  • All that your note is “artfully” perfectly fair. The problem is that since you did not have a database that you are working with, you did not know what should be in the database and what should or could be SQL code. Therefore, all "dummy" elements were provided for (database elements), you need to build the code. Again, if there is a space, ask.

    • I say that, working in the traditional style, I am a data developer, you are a developer, you must ensure that each element from your point of view is delivered, and not relied on me to interpret your notes. I will supply a database that supports all the requirements that I can extract from your notes.
      ,
  • One patient per database. Let me assume that your system will be successful, in the future you will have one database of the main workhorse, instead of limiting it to one database for each patient, which would be a nightmare for administration. Say that you need to keep all your patient details in one place, one version of the truth. This is what I have provided. This does not limit you in the short term, from introducing one Db per patient; there is no problem at all with one row in the patient table.

    • As an alternative, I can remove PatientId from all tables, and when you grow into the central database configuration, you will need a significant database update.

    • Similarly, if you have sensors or pumps that you need to track, specify their attributes. Any Sensor or Pump attribute will then be normalized to these tables. If this is a one-on-one patient, which will be fine, there will be one row in these tables, unless you need to keep a history of sensors or pumps.

  • In V1.0, subtypes were exclusive. Now they are non-exclusive. This means that we track the chronology of events without duplication; and any one event can consist of several subtypes. For instance. Notes can be inserted for any event.

    • Before finishing, the EventType list must be presented in a grid showing (a) the allowed (b) mandatory subtypes for each type of event. Thate will be implemented as CHECK Constraints in Event.
      ,
  • Naming is very important. I use the ISO 11179 standard (guidelines and principles), as well as my own agreements. Type of reading Events are prefixed as such. Feel free to suggest changes.

  • Units. Traditionally, we use Metric xor US Imperial across the entire database, allow recording no matter what the user likes, and convert before storage. If you need a mixture, then at least we should have the UnitType type specified at the patient or pump level, instead of allowing either UnitType to be stored. If you really need some type of UnitType that changes going back and forth, then yes, we need to store UnitType with each such value.

  • Temporary database. You have a temporary series that is recorded and well interpreted through SQL. Great topic, so read about it. The minimum that I would like to ask you to read and understand is:

    ▶ Temporary database performance (0NF vs 5NF) ◀

    ▶ Classic 5NF Temporary Database ◀ (carefully inspect the data model)

  • Basically the problem boils down to the following:

    • Or you have a real 5NF database, there is no data duplication, there are no update anomalies.

      • This means that for continuous time series, only StartDateTime recorded. EndDtateTime easily inferred from the StartDateTime next line, it is not saved. For instance. Event - a chronology of continuity; EventType determines whether the event is a specific DateTime or a period / duration.

      • EndDateTime is stored only for non-overlapping periods where there are legal gaps between Periods; in any case, it is clearly identified through EventType. For instance. Exercise, PumpSuspended. (By the way, I believe that the patient knows only the actual, not planned, attributes at the end of the exercise period.)

      • Since there is no EndDateTime at all, StartDateTime just a DateTime . For instance. EventDtm

      • This requires the use of regular SQL subqueries. This is actually quite simple as soon as the encoder grabs the object. For those who don't, I provided a complete tutorial on subqueries in general and used them in the Temporal context, in particular in:

      ▶ Easy when you know how ◀ . It is no coincidence that the same classic 5NF Temporal database is higher.

    • XOR you have a database with EndDateTime stored (100% duplication) with each StartDateTime column, and you can use flat slow queries. Many manipulations with large result sets using GROUP BYs instead of small result sets. Mass duplication of data and update anomalies were introduced, which made it possible to reduce the database to a flat file in order to meet the needs of encoders with limited capabilities (of course, not "ease of encoding").

    • Therefore, carefully consider and select in the long run only because it affects every code segment that accesses temporary data. You don’t want to rewrite halfway down the track when you realize that saving Update anomalies is worse than recording subqueries.

      • Of course, I will provide explicit requirements to support the 5NF temporary base, the correct data types, etc. to support all of your identified requirements.

      • Also, if you select 0NF, I will provide these fields so that the data model is completed for your purpose.

      • In any case, you need to accurately determine the SQL code needed for any given query.

  • Processing DataType is important. Do not store Time (hours, etc.) as an integer or offset. Store it only as TIME or DATETIME Datatype. If bias, save it as Midnight. This will allow unlimited SQL and Date Arithmetic functions to be used.

  • The challenge is for you. Carefully study the model and make sure that:

    • each non-key attribute has a 1 :: 1 relationship with its primary key

    • and that it is not related to any other PC (in another table)

    And of course, check out the model and provide feedback.

Question

Given the above explanations and recommendations.

  • What is ReadingBasalTemperature.Type , specify list values?

  • What is HbA1C?

  • What is KeyTone?

  • Do we need (i.e. Duration / Period EndDateTime`):

    • ReadingBasalTemperatureEnd
    • ReadingBolusEnd
    • Basal drawing
    • BasalTemp Template
    • Actually, what is a template, and how is it obtained / compared?
  • As determined by BasalTemperatureEnd (or Duration)

  • Starting position - there is no need to keep the active duration of insulin. But you need to determine how EndDateTime is EndDateTime . Based on this, if it cannot be easily obtained and is based on too many factors or changes all the time, saving EndDateTime can be good.

  • Pump settings need clarification.

V1.2

Well, I included all the information you proved in the question and comments. Here is an advanced data model.

▶ Event data model V1.2 ◀

There are still some problems that need to be addressed.

  • Use only a percentage or bid, not both with an additional indicator. One can be obtained from the other. I use Rate sequentially.

  • ... The only concern about the approach is that for many days the basal rate will be the same. Therefore redundancy

    • This is not "redundancy." This is the storage of a time series of facts that, as it turned out, are unchanged. Requests are straightforward.

    • However, with extended use, yes, you can avoid keeping the fact unchanged and instead extend the duration to include a new time interval.

  • I still do not understand your explanation of Basal Temp. Explore the new model. Firstly, templates are now stored separately. Secondly, we record the base pace of the beginning with speed. Do we need a basal end to the pace (with speed)?

  • “GlucoseEventType can have more than one value for a glucose result” requires a larger definition. Do not worry about ID keys. Just tell me about the data. For each ReadGlucoseBlood, name the result values ​​and to which GlucoseEventType applies; which are mandatory and which are optional.

  • PumpHistory.InsulinEndDateTime is the ending of Instant for the Duration. Of course, this is universal, the initial one is again - this is some kind of line with which you are comparing this. Thus, it should be seconds or minutes from midnight on January 01, 1900.

  • Check out the new PK event. If an incoming record identifies multiple events, you need to parse this and INSERT each line of the Event-EventSubtype using the same DateTime.

  • With the exception of the Patient, there are no ID keys in this database, yet. Refer to the parent full PC.

February 05, 11

No feedback received v1.2.

a lot of the data that I get is pulled from an external (and somewhat disorganized) csv, which groups certain types of events under one line and often has events on the same second, which is as grainy as it receives

It is easy to overcome. However, this means that Instant is not Instant. Now I could guide you through the whole exercise, but the bottom line is simple.

  • , SequenceNo PK, . , EventTypeCode ( EventType ). , , .

  • Instant Instant , , Temporal Databases.

  • EventType, DateTime Pk.

    • , EventTypeCode PK , , . , , ( ).
  • - ( ).

  • , . EventType Supertype; max .

re ..

, . V1.2.

06 11

" , " FAQ , ( , SO, ). , , SO. , SO-. , , ▶ V1.16 ◀ .

  • (Event.DateTime).
  • , Instant (Event).
  • ; . EventType.
  • :
    • : Disjunct
  • , DateTime , , EventType. EventTypeCode . , , - / .
  • , , .
+12
source

, , - " ", .

/ . , . , .

, , .

- , , : ..


, " " ( ), , , . , , , , - . .

, , , , , , .

, :-) .


+4
source

" , , ... , ".

. , , .

. , . , , " ", , . - .

EAV, , test_id, metric_id, value. , , . , , .

. "" . Direct Marketing, , , .

So, somewhere in the middle there are several tables, one table for each set of tests with the same values ​​for storage.

Hmmmm, this is exactly what you suggested. That sounds good!

+2
source

Take a look at these SO examples: one , two , three , four .

0
source

Source: https://habr.com/ru/post/1337098/


All Articles