In manufacturing, large amounts of data are often read at high frequency from several different data sources, such as NIR instruments, as well as general instruments for measuring pH, temperature, and pressure. This data is often stored in a process historian, usually for a long time.
In this regard, process historians have different requirements than relational databases. In most queries, the process historian requires either timestamps or time ranges for operation, as well as a set of variables of interest.
Frequent and many INSERTs, many SELECTs, few or no UPDATES at all, almost no DELETE.
Q1. Are relational databases a good backend for the process historian?
A very naive implementation of a process historian in SQL might be something like this.
+ ------------------------------------------------ +
| Variable |
+ ------------------------------------------------ +
| Id: integer primary key |
| Name: nvarchar (32) |
+ ------------------------------------------------ +
+ ------------------------------------------------ +
| Data |
+ ------------------------------------------------ +
| Id: integer primary key |
| Time: datetime |
| VariableId: integer foreign key (Variable.Id) |
| Value: float |
+ ------------------------------------------------ +
This structure is very simple, but probably slow for the usual operations of process historians, since it lacks βsufficientβ indexes.
But, for example, if the Variable table will consist of 1,000 rows (rather an optimistic number), and the data for all these 1,000 variables will be selected once per minute (also an optimistic number), then the data table will grow from 1,440,000 rows per day. Let's continue with the example, estimate that each row takes about 16 bytes, which gives about 23 megabytes per day, not counting the extra space for indexes and other overheads.
23 megabytes as such may not be that many, but keep in mind that the number of variables and samples in the example was optimistic and that the system should work 24/7/365.
Of course, archiving and compression come to mind.
Q2. Is there a better way to do this? Perhaps using some other table structure?