Create a new column based on a condition that exists over a rolling date

To make this question more general, I believe that it can also be rephrased as: Creating a current temporary variable with a variable . Although this is an unusual requirement, it can be used for different data sources.

I have a series of non-uniform time data s> 1 records per day for thousands of users. I want to create a new player_type column that tracks a rolling 30-day definition of their behavior. The behavior is determined by what games they play; the 'games' column is a factor in game A, gameB.

Thus, there are three types of behavior:

  • Exclusively plays GameA - 'A'
  • Exclusively plays GameB - 'B'
  • Play both games - 'Hybrid'

I want to use this new column to see changes in their game behavior over time, and also count the number of players in each group over time to see how they change.

Time series are very irregular for each player. Players can play several types of games per day or not play games for many months. The time series is uneven for each player, so a record is created only when the player is playing a game, so I expect the solution to use a filter something like:

interval(current_date, current_date - new_period(days=30) (using lubridate).

Here is an example dataset. Keep in mind that this is simplified and checks for a rolling change in 1 day, so simple recording verification methods will not actually work before that. If you can make a better dataset, consult and I will edit this post.

 p <- c( 1, 1, 1, 2, 2, 2, 6, 6, 6) g <- c('A', 'B', 'B', 'A', 'B', 'A', 'A', 'B', 'B') d <- seq(as.Date('2014-10-01'), as.Date('2014-10-9'), by=1) df <- data.frame(player_id = p, date = d, games = g) 

As a conclusion, I require:

  player_id date games type 1 1 2014-10-01 AA (OR NA) 2 1 2014-10-02 B Hybrid 3 1 2014-10-03 BB 4 2 2014-10-04 AA (OR NA) 5 2 2014-10-05 B Hybrid 6 2 2014-10-06 A Hybrid 7 6 2014-10-07 AA (OR NA) 8 6 2014-10-08 B Hybrid 9 6 2014-10-09 BB 

The solution should be something like apply through the columns and apply a function that checks 30 days in time, and an ifelse() statement to see what games they played.

This is a very similar message - and should help solve this problem. How to make a notional amount that looks only between certain date criteria

I also learned rowwise() and conditional mutates() with dplyr, however catch is a historical time component for me.

Thanks for the help! I can not thank this forum. I will check often.

+2
source share
1 answer

Assuming I got it right, here is data.table using the foverlaps() function.

Create dt and set the key as shown below:

 dt <- data.table(player_id = p, games = g, date = d, end_date = d) setkey(dt, player_id, date, end_date) hybrid_index <- function(dt, roll_days) { ivals = copy(dt)[, date := date-roll_days] olaps = foverlaps(ivals, dt, type="any", which=TRUE) olaps[, val := dt$games[xid] != dt$games[yid]] olaps[, any(val), by=xid][(V1), xid] } 

We create a dummy data.table ivals (for intervals), and for each row we specify the start and end dates. Note that by specifying end_date identically to dt$end_date , we will definitely have one match (and this is intentional) - this will give you a version other than the NA you are asking for.

[With some minor changes here you can get the NA version, but I will leave it to you (assuming this answer is correct).]

With this, we simply find that the range from ivals overlaps with dt , for each player_id . We get the corresponding indices. From there it's easy. If the game is not uniform, we return the corresponding dt index from hybrid_index . And we replace these indexes with a "hybrid".

 # roll days = 1L dt[, type := games][hybrid_index(dt, 1L), type := "hybrid"] # player_id games date end_date type # 1: 1 A 2014-10-01 2014-10-01 A # 2: 1 B 2014-10-02 2014-10-02 hybrid # 3: 1 B 2014-10-03 2014-10-03 B # 4: 2 A 2014-10-04 2014-10-04 A # 5: 2 B 2014-10-05 2014-10-05 hybrid # 6: 2 A 2014-10-06 2014-10-06 hybrid # 7: 6 A 2014-10-07 2014-10-07 A # 8: 6 B 2014-10-08 2014-10-08 hybrid # 9: 6 B 2014-10-09 2014-10-09 B # roll days = 2L dt[, type := games][hybrid_index(dt, 2L), type := "hybrid"] # player_id games date end_date type # 1: 1 A 2014-10-01 2014-10-01 A # 2: 1 B 2014-10-02 2014-10-02 hybrid # 3: 1 B 2014-10-03 2014-10-03 hybrid # 4: 2 A 2014-10-04 2014-10-04 A # 5: 2 B 2014-10-05 2014-10-05 hybrid # 6: 2 A 2014-10-06 2014-10-06 hybrid # 7: 6 A 2014-10-07 2014-10-07 A # 8: 6 B 2014-10-08 2014-10-08 hybrid # 9: 6 B 2014-10-09 2014-10-09 hybrid 

To clearly illustrate the idea, I created a function and copied dt inside the function. But you can avoid this and add dates in ivals directly to dt and use the arguments by.x and by.y in foverlaps() . See ?foverlaps .

+4
source

Source: https://habr.com/ru/post/1245663/


All Articles