Reducing the number of columns using a condition in R

I have a large matrix with over 1000 rows and 100 columns. Each row contains ONLY 6-10 columns that have values, and the rest are zeros. I want to create a matrix that has only 5 columns that take values โ€‹โ€‹of 5 consecutive columns in each row. For instance:

A = structure(c(0, 1L, 6L, 0, 2L, 0, 2L, 0, 1L, 4L, 1L, 3L, 7L, 2L, 6L, 2L, 4L, 0, 3L, 0, 3L, 5L, 1L, 4L, 0, 4L, 6L, 2L, 0, 0, 5L, 0, 3L, 5L, 0, 0, 0, 4L, 6L, 7L, 0, 7L, 5L, 7L, 8L, 6L, 0, 0, 8L, 9L, 0, 0, 0, 9L, 1L, 0 , 0, 0, 0, 2L, 7L, 0, 2L, 0, 0, 1L, 8L, 4, 0, 0), .Dim = c(5L, 14L)) #A = # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] #[1,] 0 0 1 2 3 4 5 0 0 6 0 0 7 1 #[2,] 1 2 3 4 5 6 0 0 7 0 0 0 0 8 #[3,] 6 0 7 0 1 2 3 4 5 0 0 0 2 4 #[4,] 0 1 2 3 4 0 5 6 7 8 9 0 0 0 #[5,] 2 4 6 0 0 0 0 7 8 9 1 2 0 0 

I want this matrix:

 B = structure(c(1L, 1L, 1L, 5L, 7L, 2L, 2L, 2L, 6L, 8L, 3L, 3L, 3L, 7L, 9L, 4L, 4L, 4L, 8L, 1L, 5L, 5L, 5L, 9L, 2L), .Dim = c(5L, 5L)) #B = # [,1] [,2] [,3] [,4] [,5] #[1,] 1 2 3 4 5 #[2,] 1 2 3 4 5 #[3,] 1 2 3 4 5 #[4,] 5 6 7 8 9 #[5,] 7 8 9 1 2 

My code is:

 df = data.frame(A) B = do.call(rbind, lapply(1:NROW(df), function(i) df[i,][(df[i,])!=0][1:5])) # or B = t(apply(X = df, MARGIN = 1, function(x) x[x!=0][1:5])) 

My code works fine for the first two lines of A, but it doesn't work for the rest of the lines. I also thought about getting indexes on columns that do not have zeros, and then to see if there are 5 consecutive columns (without any gap between them) and get their values. Any help is much appreciated!

+5
source share
4 answers

you can use rollapply :

 library(zoo) t(apply(A,1,function(x) {x[match(T,rollapply(!!x,5,all)) + (0:4)]})) # [,1] [,2] [,3] [,4] [,5] # [1,] 1 2 3 4 5 # [2,] 1 2 3 4 5 # [3,] 1 2 3 4 5 # [4,] 5 6 7 8 9 # [5,] 7 8 9 1 2 

If you have lines without any sequence of 5, it will crash, please update your post if you want to be processed.

Or the same, but more beautiful:

 library(purrr) Adf <- as.data.frame(t(A)) # data.frame fits more this data conceptually, you have different series, and it better to put them in columns res_df <- map_df(Adf,~ {.x[match(T,rollapply(.x!=0,5,all))+(0:4)]}) res_mat <- as.matrix(t(unname(res_df))) # if you want to go back to ugly :) 
+1
source

Here is an option using rle

 t(apply(A, 1, function(x) { rl <- rle(x !=0) head(x[inverse.rle(within.list(rl, values[!(values & lengths >= 5)] <- FALSE))], 5)})) # [,1] [,2] [,3] [,4] [,5] #[1,] 1 2 3 4 5 #[2,] 1 2 3 4 5 #[3,] 1 2 3 4 5 #[4,] 5 6 7 8 9 #[5,] 7 8 9 1 2 
+1
source

EDIT: skipped some details, here is my new attempt to use apply and the base library:

 cumfun <- function(x){ y<-ifelse(x>0,1,0) tmp<-cumsum(y) pos<-which(tmp-cummax((!y)*tmp)==5) x[(pos-4) : pos] } B<-t(apply(A,1,cumfun)) > B [,1] [,2] [,3] [,4] [,5] [1,] 1 2 3 4 5 [2,] 1 2 3 4 5 [3,] 1 2 3 4 5 [4,] 5 6 7 8 9 [5,] 7 8 9 1 2 
0
source
 library(zoo) t(apply(A, MAR = 1, function(x, n = 5) x[which(rollsum(!!x, n)==n)[1]+seq_len(n)-1])) [,1] [,2] [,3] [,4] [,5] [1,] 1 2 3 4 5 [2,] 1 2 3 4 5 [3,] 1 2 3 4 5 [4,] 5 6 7 8 9 [5,] 7 8 9 1 2 
0
source

Source: https://habr.com/ru/post/1270968/


All Articles