I am running Postgres 9.2 and I have a large table similar to
CREATE TABLE sensor_values
(
ts timestamp with time zone NOT NULL,
value double precision NOT NULL DEFAULT 'NaN'::real,
sensor_id integer NOT NULL
)
I have values coming into the system constantly, i.e. a lot per minute. I want to maintain the current standard deviation / mean for the last 200 values, so I can determine if the new values that are part of the system are within 3 standard deviations of the mean. To do this, I will need the current standard deviation, and it will be constantly updated for the last 200 values. Since the table can be hundreds of millions of rows, I don’t want the latter to download 200 rows for a sensor ordered by time, and then do vg (value), var_samp (value) for each new value. I and assuming this will update the standard deviation and mean value faster.
I started writing the PL / pgSQL function to update the rolling variance and value for each new value that the system enters for a particular sensor.
I can do this using an alias like
newavg = oldavg + (new_value - old_value)/window_size
new_variance += (new_value-old_value)*(new_value-newavg+old_value-oldavg)/(window_size-1)
This is based on
http://jonisalonen.com/2014/efficient-and-accurate-rolling-standard-deviation/
Basically, the window has a value of 200. The old value is the first value of the window. When a new meaning comes, we move the window forward. After receiving the result, I save the following values for the sensor
The first value of the window.
The mean average of the window values.
The variance of the window values.
Thus, I do not need to constantly get the last 200 values and do the sum, etc. I can reuse these values when a new sensor value arrives.
My problem is in the first run. I do not have previous window data for the sensor, i.e. of the three values above, so I have to do it in a slow way.
sort of
WITH s AS
(SELECT value FROM sensor_values WHERE sensor_values.sensor_id = $1 AND ts >= (NOW() - INTERVAL '2 day')::timestamptz ORDER BY ts DESC LIMIT 200)
SELECT avg(value), var_samp(value) INTO last_window_average, last_window_variance FROM s;
(ealiest) select?
s PL/pgSQL.
, PL/pgSQL /, , , ?
?