Getting related components in R

I have a matrix with values ​​0 or 1, and I would like to get a list of groups of adjacent 1.

For example, the matrix

mat = rbind(c(1,0,0,0,0), c(1,0,0,1,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,1,1,1,1)) > mat [,1] [,2] [,3] [,4] [,5] [1,] 1 0 0 0 0 [2,] 1 0 0 1 0 [3,] 0 0 1 0 0 [4,] 0 0 0 0 0 [5,] 1 1 1 1 1 

should return the following 4 connected components:

C1 = {(1,1); (2.1)}

C2 = {(2,4)}

C3 = {(3,3)}

C4 = {(5.1); (5.2); (5.3); (5.4); (5.5)}

Does anyone have an idea how to do this quickly in R? My real matrix is ​​really quite large, like 2000x2000 (but I expect the number of connected components to be quite small, i.e. 200).

+5
source share
1 answer

With the update, you can turn your binary matrix into a raster object and use the clumps function. Then just data management will return the exact format you want. Example below:

 library(igraph) library(raster) mat = rbind(c(1,0,0,0,0), c(1,0,0,1,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,1,1,1,1)) Rmat <- raster(mat) Clumps <- as.matrix(clump(Rmat, directions=4)) #turn the clumps into a list tot <- max(Clumps, na.rm=TRUE) res <- vector("list",tot) for (i in 1:max(Clumps, na.rm=TRUE)){ res[i] <- list(which(Clumps == i, arr.ind = TRUE)) } 

which then res is output to the console:

 > res [[1]] row col [1,] 1 1 [2,] 2 1 [[2]] row col [1,] 2 4 [[3]] row col [1,] 3 3 [[4]] row col [1,] 5 1 [2,] 5 2 [3,] 5 3 [4,] 5 4 [5,] 5 5 

I would not be surprised if there is a better way to go from a raster object to your final goal. Again, the 2000 to 2000 matrix should not be a big problem for this.


Old (wrong answer), but should be useful for people who want to connect the components of the chart.

You can use the igraph package to turn the adjacency matrix into a network and return components. Your sample graph is one component, so I removed one edge for illustration.

 library(igraph) mat = rbind(c(1,0,0,0,0), c(1,0,0,1,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,1,1,1,1)) g <- graph.adjacency(mat) %>% delete_edges("5|3") plot(g) clu <- components(g) groups(clu) 

The final line is then returned at the command prompt:

 > groups(clu) $`1` [1] 1 2 4 5 $`2` [1] 3 

My experience with this algorithm is pretty fast - so I don’t think 2000 to 2000 would be a problem.

+2
source

Source: https://habr.com/ru/post/1244356/


All Articles