R - Extract multiple rows from column 1 if a specific value appears in column 2

My question is about extracting multiple values ​​from data.frame in R and putting them in a new data.frame.

I have a data.frame that looks like this (df)

PRICE EVENT 1.50 0 1.70 0 1.65 0 1.20 1 0.90 0 1.70 0 1.55 0 . . . . 1.10 0 1.20 0 1.14 1 0.90 0 

My actual data.frame has these two columns and over 300,000 rows. A column called EVENT only has a value of 0 OR 1 (value 1 is a proxy server when a specific event occurs).

The first step in my research: analyze the price if an event has occurred. The first step is simple. I did it with

 vector<-df[df$EVENT==1, "PRICE"] 

now vector contains all Prices for the Month. (here: 1.20 and 1.14)

but now the second stage of my research is interesting:

Now I want not only prices per day, but also prices x days before and after the day and put them in the matrix

Example: I want prices two days before the event and one day after the event (including the day of the event)

than the new data.frame that I'm trying to create will look like

  Event 1 Event n -2 1.70 ... 1.10 -1 1.65 ... 1.20 0 1.20 ... 1.14 +1 0.90 ... 0.90 

Please keep in mind that the 4-day interval [-2: 1] is just an example. In my actual research, I have to cover a 91-day gap [-30: 60].

Thanks for the help:)

+5
source share
4 answers

We can create a matrix containing the corresponding row numbers, and then use it as a mask to achieve the expected result:

 event_rows <- which(df$EVENT==1) mask <- sapply(event_rows, function(x) (x-2):(x+2)) apply(mask, 2, function(x) df$PRICE[x]) # [,1] [,2] #[1,] 1.70 1.10 #[2,] 1.65 1.20 #[3,] 1.20 1.14 #[4,] 0.90 0.90 #[5,] 1.70 NA 

Data

 df <- structure(list(PRICE = c(1.5, 1.7, 1.65, 1.2, 0.9, 1.7, 1.55, 1.1, 1.2, 1.14, 0.9), EVENT = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L)), .Names = c("PRICE", "EVENT"), class = "data.frame", row.names = c(NA, -11L)) 
+4
source

In order to complete, here's a basic solution of R :

 # example data set.seed(123) df <- data.frame(price = rnorm(100), event = rbinom(100, 1, 0.05)) # create a vector of unique event positions with additional 2 positions before and 1 ahead offset <- unique(as.vector(sapply(which(df$event == 1), function(x) c((x-2):(x+1))))) # subset data df[offset[offset >0 & offset <= 100],] price event 1 -0.56047565 0 2 -0.23017749 1 3 1.55870831 0 20 -0.47279141 0 21 -1.06782371 0 22 -0.21797491 1 23 -1.02600445 0 46 -1.12310858 0 47 -0.40288484 0 48 -0.46665535 1 49 0.77996512 1 50 -0.08336907 0 62 -0.50232345 0 63 -0.33320738 0 64 -1.01857538 1 65 -1.07179123 0 75 -0.68800862 0 76 1.02557137 0 77 -0.28477301 1 78 -1.22071771 0 95 1.36065245 0 96 -0.60025959 0 97 2.18733299 1 98 1.53261063 0 

Edit: I did not see the expected output at first, see @mtoto's answer for this.

+2
source

What would I do, expand the base data frame with data using lags, and then select line by line. Using tidyverse, it will be something like this. (I highly recommend using tidyverse rather than R base. But it is up to you)

 library(tidyverse) # generate example data frame df <- data.frame(price = rnorm(100), event = rbinom(100, 1, 0.5)) # generate a vector from one the desired number of lags. # we map this vector with a function that returns the lagged # values of the price. then we join by columns lags <- map(1:3, function(x){lag(df$price, n = x)}) %>% reduce(cbind) %>% as.data.frame %>% set_names(paste('priceLag', 1:3, sep = '')) # bind lags to original data frame, select events == 1 out <- cbind(df, lags) %>% filter(df$event == 1) 
0
source
 library('tidyverse') df <- data.frame( price = seq_len(20), event = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0)) df # price event # 1 1 0 # 2 2 0 # 3 3 0 # 4 4 0 # 5 5 1 # 6 6 0 # 7 7 0 # 8 8 0 # 9 9 0 # 10 10 0 # 11 11 0 # 12 12 1 # 13 13 0 # 14 14 0 # 15 15 0 # 16 16 1 # 17 17 1 # 18 18 0 # 19 19 0 # 20 20 0 

You can use lag and lead to get offset values. Then use a combination of gather and spread to flip the data frame into the desired shape.

 df %>% mutate( `-2` = lag(price, 2), `-1` = lag(price), `0` = price, `+1` = lead(price)) %>% select(-price) %>% filter(event == 1) %>% mutate(event = paste0('event_', seq_along(event))) %>% gather('offset', 'value', -event) %>% spread(event, value) %>% arrange(as.numeric(offset)) # offset event_1 event_2 event_3 event_4 # 1 -2 3 10 14 15 # 2 -1 4 11 15 16 # 3 0 5 12 16 17 # 4 +1 6 13 17 18 
0
source

Source: https://habr.com/ru/post/1274948/


All Articles