Grouping data in kdb with a specific condition

I have a main dataframe named raw that looks like this:

tab:([]date:2018.02.05 2018.02.05 2018.02.06 2018.02.06;time:01:30:25.000 02:30:45.000 04:15:15.000 02:15:15.000;vol:50 55 64 12; name:`A`B`B`A)

 date time vol name 2018.02.05 1:30:25 50 A 2018.02.05 2:30:45 55 B 2018.02.06 4:15:15 64 B 2018.02.06 2:15:15 12 A 

I need to create a new table depending on conditions such as:

Between two specific dates, I need to find the time when the cumulative volume is 100 for the name B for two hours.

The logic that I think should work: sort data in ascending order. Add all vol with name = `B for the time (time [i]: time [i] + 2 hours). If cum vol> 100, return the time intervals and the corresponding date. Continue with i +1. I'm new to kdb, so it's hard for me to execute it.

Output Example:

 time1 time2 date1 date2 1:30:00 3:30:00 2018.02.05 2018.02.05 23:00:00 1:00:00 2018.02.05 2018.02.06 

Any guidance on this is appreciated. thanks

+4
source share
2 answers

I believe that the solution to your problem can be achieved with aj

Initially, as you indicated, the table should be sorted by time

 `time xasc `tab; 

Then you need to create a cumulative sum of volumes using the amounts

 tab:update cumvol:sums vol by name from tab 

Then, using aj, get the cumulative sums of volumes that are not included in 2-hour periods for each time.

 aj[`name`time;tab;select time:time+02:00,name,cumvol2:cumvol from tab] 

Then we can do cumvol-cumvol2 to get the total volume for each 2-hour period

 tab:select time, name, runningvol:cumvol-0^cumvol2 from aj[`name`time;tab;select time:time+02:00,name,cumvol2:cumvol from tab] 

Then a simple select statement can get the time when cumvol is greater than 100

 select time,time+02:00 from tab where runningvol>100 

An improvement that could be added to this would be to add a grouped attribute to the second table in aj. Another improvement to this would be to format dates and times into a single timestamp or date and time.

More information about the aj and sum functions can be found here:

http://code.kx.com/q/ref/joins/#aj-aj0-asof-join

http://code.kx.com/q/ref/arith-integer/#sums

+1
source

You can also use the join wj1 window . In the example table:

 t:`time xasc ([]time:(1000?2018.02.05 2018.02.06)+1000?24:00:00;sym:1000?`A`B`C;vol:1000?10); 

The following function combines vol in 2-hour windows relative to the timestamp and transmits table t , start date s , end date e and name n .

 fw:{[t;s;e;n] r:@[;`sym;`p#]`sym`time xasc select from t where time.date within(s;e),sym=n; :select from wj1[r[`time]-/:02:00 00:00;`time;r;(r;(sum;`vol))] where vol>100; }; 

Running for the name / sym B gives:

 q)fw[t;2018.02.05;2018.02.06;`B] time sym vol ------------------------------------- 2018.02.05D18:12:39.000000000 B 104 2018.02.05D18:35:47.000000000 B 101 2018.02.05D18:40:17.000000000 B 102 ... 

It can also be modified to give all results for all names / sim:

 fw1:{[t;s;e] r:@[;`sym;`p#]`sym`time xasc select from t where time.date within(s;e); :select from wj1[r[`time]-/:02:00 00:00;`sym`time;r;(r;(sum;`vol))] where vol>100; }; 

Running without a name / sim this time:

 q)fw1[t;2018.02.05;2018.02.06] time sym vol ------------------------------------- 2018.02.05D02:01:36.000000000 A 106 2018.02.05D02:52:23.000000000 A 103 2018.02.05D03:06:51.000000000 A 105 ... 

Although this approach is less efficient than using aj , it still illustrates how you can achieve this with window connections.

+1
source

Source: https://habr.com/ru/post/1275641/


All Articles