R - Extract multiple rows from column 1 if a specific value appears in column 2

Question

R - Extract multiple rows from column 1 if a specific value appears in column 2

My question is about extracting multiple values from data.frame in R and putting them in a new data.frame.

I have a data.frame that looks like this (df)

PRICE EVENT 1.50 0 1.70 0 1.65 0 1.20 1 0.90 0 1.70 0 1.55 0 . . . . 1.10 0 1.20 0 1.14 1 0.90 0

My actual data.frame has these two columns and over 300,000 rows. A column called EVENT only has a value of 0 OR 1 (value 1 is a proxy server when a specific event occurs).

The first step in my research: analyze the price if an event has occurred. The first step is simple. I did it with

 vector<-df[df$EVENT==1, "PRICE"]

now vector contains all Prices for the Month. (here: 1.20 and 1.14)

but now the second stage of my research is interesting:

Now I want not only prices per day, but also prices x days before and after the day and put them in the matrix

Example: I want prices two days before the event and one day after the event (including the day of the event)

than the new data.frame that I'm trying to create will look like

  Event 1 Event n -2 1.70 ... 1.10 -1 1.65 ... 1.20 0 1.20 ... 1.14 +1 0.90 ... 0.90

Please keep in mind that the 4-day interval [-2: 1] is just an example. In my actual research, I have to cover a 91-day gap [-30: 60].

Thanks for the help:)

+5

r dataframe rows

Bit Jan 25 '18 at 8:53

source share

4 answers

mtoto · Answer 1 · 2018-01-25T09:24:46+0000

We can create a matrix containing the corresponding row numbers, and then use it as a mask to achieve the expected result:

 event_rows <- which(df$EVENT==1) mask <- sapply(event_rows, function(x) (x-2):(x+2)) apply(mask, 2, function(x) df$PRICE[x]) # [,1] [,2] #[1,] 1.70 1.10 #[2,] 1.65 1.20 #[3,] 1.20 1.14 #[4,] 0.90 0.90 #[5,] 1.70 NA

Data

 df <- structure(list(PRICE = c(1.5, 1.7, 1.65, 1.2, 0.9, 1.7, 1.55, 1.1, 1.2, 1.14, 0.9), EVENT = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L)), .Names = c("PRICE", "EVENT"), class = "data.frame", row.names = c(NA, -11L))

Lap · Answer 2 · 2018-01-25T09:25:28+0000

In order to complete, here's a basic solution of R :

 # example data set.seed(123) df <- data.frame(price = rnorm(100), event = rbinom(100, 1, 0.05)) # create a vector of unique event positions with additional 2 positions before and 1 ahead offset <- unique(as.vector(sapply(which(df$event == 1), function(x) c((x-2):(x+1))))) # subset data df[offset[offset >0 & offset <= 100],] price event 1 -0.56047565 0 2 -0.23017749 1 3 1.55870831 0 20 -0.47279141 0 21 -1.06782371 0 22 -0.21797491 1 23 -1.02600445 0 46 -1.12310858 0 47 -0.40288484 0 48 -0.46665535 1 49 0.77996512 1 50 -0.08336907 0 62 -0.50232345 0 63 -0.33320738 0 64 -1.01857538 1 65 -1.07179123 0 75 -0.68800862 0 76 1.02557137 0 77 -0.28477301 1 78 -1.22071771 0 95 1.36065245 0 96 -0.60025959 0 97 2.18733299 1 98 1.53261063 0

Edit: I did not see the expected output at first, see @mtoto's answer for this.

Joaquin · Answer 3 · 2018-01-25T09:10:13+0000

What would I do, expand the base data frame with data using lags, and then select line by line. Using tidyverse, it will be something like this. (I highly recommend using tidyverse rather than R base. But it is up to you)

 library(tidyverse) # generate example data frame df <- data.frame(price = rnorm(100), event = rbinom(100, 1, 0.5)) # generate a vector from one the desired number of lags. # we map this vector with a function that returns the lagged # values of the price. then we join by columns lags <- map(1:3, function(x){lag(df$price, n = x)}) %>% reduce(cbind) %>% as.data.frame %>% set_names(paste('priceLag', 1:3, sep = '')) # bind lags to original data frame, select events == 1 out <- cbind(df, lags) %>% filter(df$event == 1)

Paul · Answer 4 · 2018-01-25T09:22:01+0000

 library('tidyverse') df <- data.frame( price = seq_len(20), event = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0)) df # price event # 1 1 0 # 2 2 0 # 3 3 0 # 4 4 0 # 5 5 1 # 6 6 0 # 7 7 0 # 8 8 0 # 9 9 0 # 10 10 0 # 11 11 0 # 12 12 1 # 13 13 0 # 14 14 0 # 15 15 0 # 16 16 1 # 17 17 1 # 18 18 0 # 19 19 0 # 20 20 0

You can use lag and lead to get offset values. Then use a combination of gather and spread to flip the data frame into the desired shape.

 df %>% mutate( `-2` = lag(price, 2), `-1` = lag(price), `0` = price, `+1` = lead(price)) %>% select(-price) %>% filter(event == 1) %>% mutate(event = paste0('event_', seq_along(event))) %>% gather('offset', 'value', -event) %>% spread(event, value) %>% arrange(as.numeric(offset)) # offset event_1 event_2 event_3 event_4 # 1 -2 3 10 14 15 # 2 -1 4 11 15 16 # 3 0 5 12 16 17 # 4 +1 6 13 17 18

R - Extract multiple rows from column 1 if a specific value appears in column 2

More articles: