Delete a bunch of strings by growth names - how to initialize a null string in R?

I have this sparse matrix I named N:

4 x 4 sparse Matrix of class "dgCMatrix" C1 C2 C3 C4 V1 . 3 5 2 V2 . 5 1 . V3 . . . . V4 . . 4 . 

I am trying to delete rows with two or more missing values. I expect the result to be:

  C1 C2 C3 C4 V1 . 3 5 2 

I wrote this piece of code:

  #iterate on rows and count: #how many values in row ri are bigger than 0 # if count is not bigger than limit, remove row ri limit = 3 for(ri in 1:nrow(N)){ count <- length(which(N[ri,]>0)) if (count <limit){ tmp <- paste("V",ri,sep="") rmv <- paste (rmv, tmp, sep= " ") } } #now remove specific row names N <- N[!rownames(N) %in% rmv, ] 

The problem is that this does not work, because rmv is not specified in the first loop, and I get an error:

 "object 'rmv' not found" 

How can i initialize rmv? If I use:

 rmv <- "" 

Then I get a line that starts with empty space, for example:

 > rmv [1] " V2" 

and then my last line does not work:

 N <- N[!rownames(N) %in% rmv, ] 

Also, this is the very first code I've ever written in R, so if there is anything important that I am missing in the basic concepts that I would like to read (it took me 6 hours and read a lot in stackoverflow and various R tutorials, but I'm very proud to get to this, this is my first question).

Thanks!

+4
source share
2 answers

With a large sparse matrix, you will need to work with the summary matrix, or as.matrix will result in as.matrix memory:

 library(Matrix) M <- sparseMatrix(i = c(1, 1, 1, 2, 2, 4), j = c(2, 3, 4, 2, 3, 2), x = c(3, 5, 2, 5, 1, 4)) M[tabulate(summary(M)$i) > 2, , drop = FALSE] # 1 x 4 sparse Matrix of class "dgCMatrix" # # [1,] . 3 5 2 

A step-by-step description of how this works:

 summary(M) # 4 x 4 sparse Matrix of class "dgCMatrix", with 6 entries # ijx # 1 1 2 3 # 2 2 2 5 # 3 4 2 4 # 4 1 3 5 # 5 2 3 1 # 6 1 4 2 tabulate(summary(M)$i) # [1] 3 2 0 1 tabulate(summary(M)$i) > 2 # [1] TRUE FALSE FALSE FALSE 
+2
source

Assuming your sparse matrix is ​​called N , this should do it:

 N[rowSums(as.matrix(N) == 0) < 2, ] 

A small example with some data from ?xtabs :

 d.ergo <- data.frame(Type = paste0("T", rep(1:4, 9*4)), Subj = gl(9, 4, 36*4)) set.seed(15) # a subset of cases: N <- xtabs(~ Type + Subj, data = d.ergo[sample(36, 10), ], sparse = TRUE) N # 4 x 9 sparse Matrix of class "dgCMatrix" # 1 2 3 4 5 6 7 8 9 # T1 . 1 . 1 . 1 . 1 . # T2 1 . . . . . 1 . 1 # T3 . . . . 1 . . . . # T4 1 . . . . . 1 . . rowSums(as.matrix(N) == 0) ## How many missing # T1 T2 T3 T4 # 5 6 8 7 ## Let remove any with more than 7 missing N[rowSums(as.matrix(N) == 0) < 7, ] # 2 x 9 sparse Matrix of class "dgCMatrix" # 1 2 3 4 5 6 7 8 9 # T1 . 1 . 1 . 1 . 1 . # T2 1 . . . . . 1 . 1 
+3
source

Source: https://habr.com/ru/post/1496553/


All Articles