A bit of an old question, but this probably applies to anyone who works with TSDB.
When I first started, my application usually consisted in the fact that each data point fell into one field dimension. The assumption was that I would match the data that I need in the SQL statement later. However, since anyone who has used TSDB as an influx knows that there are some serious limitations that can be made when retrieving data due to the design options used in implementing TSDB.
As I progressed in my project, here are the basic rules that I developed:
The dimension should contain all necessary for this dimension, but no more.
Example: imagine a gas flow meter that gives 3 signals:
- volumetric flow
- temperature
- total flow
In this case, the volumetric flow and temperature should be two fields of the same measurement, and the total flow should be its own measurement.
(if the reader doesn't like this example, think of a home electric meter that displays amplifiers and volts, as well as kw and pf).
Why would it be bad to store volume and temperature in different series?
Timing: if you save these two dimensions in different series, they will have different index values ββ(timestamp). If you do not make sure that they clearly indicate timestamps, you risk that they are slightly rejected. It can very well become a bad substance (tm) because you can introduce systematic measurements in your data. Even if it is not, it will be very unpleasant if you want to reuse this data later (for example, to dump it into a csv file).
Utility: if you want to display the volume flow, you will need to get constant * temp * volume to get the correct value. Doing this with two separate dimensions becomes a nightmare because, for example, influxdb does not even support the operation. But even if that were the case, you had to make sure that the missing values ββof one of the fields are not processed correctly and that grouping and aggregation are performed correctly.
Why would it be bad to store all three in one dimension?
You may well have a use case in which you want to constantly check all three values, but most likely it is not, and you do not need to measure the total volume at the same frequency that you would like to measure the stream itself.
Putting all fields in one dimension will force you to either place zeros in certain fields, or always register a variable that barely changes. In any case, it is inefficient.
An important understanding is that multidimensional objects require that all their measurements be meaningful at the same time .
source share