InfluxDB: single or multiple measurement

I start with infuxDB, and after reading the Schema project documentation, the question remains.

How to decide whether to use one dimension with several fields or several dimensions with one field?

I have several iot devices that ship every minute (temperature, humidity, pressure). All this data has exactly the same timestamp.

Therefore, I was wondering if instead of creating one dimension:

timestamp,iotid,temperature,humidity,pressure ------------------------------------------------- 1501230195,iot1,70, 45, 850 

Or 3 dimensions (one for each value), with the same tags, but only one field in it?

 timestamp,iotid,temperature ---------------------------- 1501230195,iot1,70 timestamp,iotid,humidity ------------------------- 1501230195,iot1,45 timestamp,iotid,pressure ------------------------- 1501230195,iot1,850 

Upon request, I could get only one value, but also 3 at the same time.

+8
source share
4 answers

There is no right or wrong solution for circuit design, but with one measurement, one field value is more suitable.

Why?

Storing multiple field values ​​in measurement is a very relational database. That is, measurement should not be considered as a database table , since this is a completely different thing.

A measurement must be explicitly reserved to describe a data type, such as temperature or CPU usage.

If we design our circuit using one field value per measurement , then we can describe the data in real English, for example:

At some point time, the temperature measured is equal to data value=30 . Noticed the term point , data and measurement used here.

If you put multiple field values in a particular measurement , it will be difficult for you to represent data in real English.

influxdb is a time series database, so it’s obvious that we need to do this in a time-series fashion.

In addition, some time series data is actually measured to the level of accuracy of microseconds. With such fine grain, even for milliseconds , it is unlikely that the data set will have the same synchronization. Therefore, designing it as a single dimension containing a sequence of data points is always the best choice.

+5
source

It probably depends on your data, try both and see the storage requirements. For example, if the humidity does not change much, then it makes sense to separate it. But if some variables change at the same time intervals, then it makes sense to combine them. It may also depend on your query patterns.

+2
source

A bit of an old question, but this probably applies to anyone who works with TSDB.

When I first started, my application usually consisted in the fact that each data point fell into one field dimension. The assumption was that I would match the data that I need in the SQL statement later. However, since anyone who has used TSDB as an influx knows that there are some serious limitations that can be made when retrieving data due to the design options used in implementing TSDB.

As I progressed in my project, here are the basic rules that I developed:

The dimension should contain all necessary for this dimension, but no more.

Example: imagine a gas flow meter that gives 3 signals:

  • volumetric flow
  • temperature
  • total flow

In this case, the volumetric flow and temperature should be two fields of the same measurement, and the total flow should be its own measurement.

(if the reader doesn't like this example, think of a home electric meter that displays amplifiers and volts, as well as kw and pf).

Why would it be bad to store volume and temperature in different series?

  1. Timing: if you save these two dimensions in different series, they will have different index values ​​(timestamp). If you do not make sure that they clearly indicate timestamps, you risk that they are slightly rejected. It can very well become a bad substance (tm) because you can introduce systematic measurements in your data. Even if it is not, it will be very unpleasant if you want to reuse this data later (for example, to dump it into a csv file).

  2. Utility: if you want to display the volume flow, you will need to get constant * temp * volume to get the correct value. Doing this with two separate dimensions becomes a nightmare because, for example, influxdb does not even support the operation. But even if that were the case, you had to make sure that the missing values ​​of one of the fields are not processed correctly and that grouping and aggregation are performed correctly.

Why would it be bad to store all three in one dimension?

You may well have a use case in which you want to constantly check all three values, but most likely it is not, and you do not need to measure the total volume at the same frequency that you would like to measure the stream itself.

Putting all fields in one dimension will force you to either place zeros in certain fields, or always register a variable that barely changes. In any case, it is inefficient.

An important understanding is that multidimensional objects require that all their measurements be meaningful at the same time .

+2
source

I thought to mention that there is a valid third option:

 timestamp,iotid,measure,value ---------------------------- 1501230195, iot1, temperare, 70 1501230195, iot1, humidity, 45 1501230195, iot1, pressure, 850 
0
source

Source: https://habr.com/ru/post/1270313/


All Articles