Timeserie Database Line Storage

I would like to store time series in a MySQL database. I would like to do it linearly, i.e. Each line represents a unique observation (1 measure, 1 site, 1 timestamp). Currently, this will require a line of 84 096 000 , and it will grow approximately 2 102 400 lines per year.

What precautions should be taken to correctly develop a table of time series, indexes, and related queries (essentially, the choice of data that defines the measure, site, and time range).

Edit:

Adding a table design suggestion:

 CREATE TABLE TimeSeries( Id INT NOT NULL AUTO_INCREMENT, MeasureTimeStamp DATETIME NOT NULL, MeasureId INT NOT NULL, SiteId INT NOT NULL, Measure FLOAT NOT NULL, Quality INT NOT NULL, PRIMARY KEY (Id), CONSTRAINT UNIQUE (MeasureTimeStamp,MeasureId,SiteId), FOREIGN KEY (MeasureId) REFERENCES Measure(Id), FOREIGN KEY (SiteId) REFERENCES Site(Id) ); CREATE INDEX ChannelIndex ON TimeSeries(MeasureId,SiteId); 

Provided that there is a table called Measure and Site, which should be improved for this structure if my basic queries are:

 SELECT * FROM TimeSeries WHERE (MeasureId IN (?,?,?)) AND (SiteId IN (?,?,?)) AND (MeasureTimeStamp BETWEEN ? AND ?) ORDER BY MeasureId ASC, SiteId ASC, MeasureTimeStamp ASC; 

Edit 2:

Sites are about 20, and measures are about 50. This leads to a maximum of 1000 channels (a couple of sites and a measurement). It may increase slightly over several decades, but will not reach more than 10,000 channels. Most of the data has a temporary granularity of about 30 minutes. In any case, the granulation of time is not constant and will not be less than a minute (some data daily or weekly).

+5
source share
1 answer

Some tips:

  • An index in MySQL is a list of your primary keys, ordered by your index columns. You want to order this list in such a way that it is as easy as possible to find the values ​​you need.
  • MySQL uses only one table index at a time.
  • MySQL can use an index from left to right ( MySQl Multi-column Indexes ). This means that Index (A, B, C) allows WHERE A=? AND B=? WHERE A=? AND B=? but not WHERE B=? AND C=? WHERE B=? AND C=? .

In your example, four indexes are created:

  • MeasureId,SiteId (ChannelIndex)
  • MeasureTimeStamp,MeasureId,SiteId (unique limitation)
  • MeasureId (foreign key)
  • SiteId (foreign key)

Simply put, ChannelIndex is sorted as a list of strings combining MeasureId and SiteId. For instance. for MeasureId = 12 and site Id = 68, you can represent the sort value as 12_68 . Your unique constraint is sorted according to the values, for example 2014-12-23 09:01:43_12_68 .

To solve your query, MySQL can either use your index or a unique constraint. It depends on the data in your table that it selects. However, none of them is optimal. Using the index, he will quickly find the blocks in the index that have the correct MeasureId and SiteId , but then he will need to go to each value in the main table to check if MeasureTimeStamp in the range. Using a unique constraint, he can easily select a time range. However, this subset of indexes has a MeasureId and a SiteId randomly ordered, still ordered MeasureTimeStamp.

To improve your structure, this will help change your unique restriction on

CONSTRAINT UNIQUE (MeasureId, SiteId, MeasureTimeStamp)

Now this index will be sorted with values ​​such as 12_68_2014-12-23 09:01:43 , which, I believe, will demonstrate better performance, since MySQL can now select a discrete and predictable number of ranges within the index. This covers your SELECT statement and makes your index redundant at the same time.

+1
source

Source: https://habr.com/ru/post/1206472/


All Articles