Django Database Planning - Time Series Data

Question

Django Database Planning - Time Series Data

I would like some tips on how best to organize my django models / database tables for storing data in my webapp

I create a site on which the telemetry data of users from a racing sim game will be stored. Thus, there will be an application for desktop companions that will display game data every 0.1 seconds for a variety of information (car, track, speed, gas, brake, clutch, rpm, etc.). For example, in a two-minute race, each of these variables will contain 1200 data points (10 samples per second * 120 seconds).

The important thing is that this data list can consist of 20 variables and can potentially grow in the future. Thus, 1200 * the number of variables you have the amount of data for a single racing session. If one user sends 100 sessions, and there are 100 users ... the amount of data is added up very quickly.

Then the application will send all this data for the racing session to the database for the website. Data MUST be transferred between the game and the website via a CSV file. Therefore, structurally, I limit myself to what CSV can do. Then the website will allow you to select a racing session / lap and build this information on separate time series graphs (for each variable) and, importantly, let you build a session against someone to see where the differences lie

My question here is how do you structure such a database to store this great information?

The simplest structure that I have in mind is to have a separate table for each racing track, then each line / record will be a racing session on that track. The fields in this table are the variables above.

I have a problem:

1) most of the variables in the above list are time series data, and not separate values (for example, the speed var may look like this: 70, 72, 74, 77, 72, 71, 65, where the values are samples with an interval of 0.1 seconds separately along the entire circle). How to save this type of information in a table / field?

2) The length of each var in the above list will always be the same length for any session of one race (if your knees took 1 min. 35, then all of your vars will only capture data for this period of time), but at the same time I want to have the ability to compare different circles with each other, the session time will be different for each circle. In other words, however, I save time series data for these variables, it must be a variable size

Any thoughts would be appreciated.

+5

python database django postgresql time-series

Simon Dec 30 '15 at 12:10

source share

1 answer

Maciek · Accepted Answer · 2014-12-30T14:07:12+0000

One thing that can help you in HUGE tables is partitioning. Judging by the postgresql tag that you asked for your question, look here: http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

But for starters, I would go with one simple table supported by a reasonable set of indexes. As I understand it, each data item in the table will be identified by a race identifier identifier, a player identifier, and a time indicator. These columns should be covered by indexes according to your query requirements.

As for your two questions: 1) You save this data as prime integers. Remember to set the correct data types for these columns. E.g. if you are 100% sure that some values will be very small, you can use the smallint data smallint . Read more about integer data types here: http://www.postgresql.org/docs/9.3/static/datatype-numeric.html#DATATYPE-INT

2) This will not be a problem if each var list is a different row in the table. You can insert as many as you want.

So, to summarize. I would start with a VERY simple simple table schema. From a django perspective, it would look something like this:

 class RaceTelemetryData(models.Model): user = models.ForeignKey(..., index_db=True) race = models.ForeignKey(YourRaceModel, index_db=True) time = models.IntegerField() gas = models.IntegerField() speed = models.SmallIntegerField() # and so on...

Additionally, you must create an index (manually) for the columns (user_id, race_id, time) so that searches, data about one race session (and sorting) are fast.

In the future, if you find that the performance of this particular table is too slow, you will be able to experiment with additional indexes or partitioning. PostgreSQL is quite flexible in modifying existing database structures, so you should not have a lot of problems with it.

If you decide to add a new new collection to the collection, you just need to add a new column to the table.

EDIT:

As a result, you will get one table that has at least these columns: user_id - indicate what user data belongs to this row. race_id - indicate which race data belongs to this line. time - determine the correct order of data presentation.

Thus, when you want to get information about the 5th race, you will search for rows with user_id = 'Joe_ID' and race_id = 5 , and then sort all these rows by time column.

Django Database Planning - Time Series Data

More articles: