With the update, you can turn your binary matrix into a raster object and use the clumps function. Then just data management will return the exact format you want. Example below:
library(igraph) library(raster) mat = rbind(c(1,0,0,0,0), c(1,0,0,1,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,1,1,1,1)) Rmat <- raster(mat) Clumps <- as.matrix(clump(Rmat, directions=4)) #turn the clumps into a list tot <- max(Clumps, na.rm=TRUE) res <- vector("list",tot) for (i in 1:max(Clumps, na.rm=TRUE)){ res[i] <- list(which(Clumps == i, arr.ind = TRUE)) }
which then res
is output to the console:
> res [[1]] row col [1,] 1 1 [2,] 2 1 [[2]] row col [1,] 2 4 [[3]] row col [1,] 3 3 [[4]] row col [1,] 5 1 [2,] 5 2 [3,] 5 3 [4,] 5 4 [5,] 5 5
I would not be surprised if there is a better way to go from a raster object to your final goal. Again, the 2000 to 2000 matrix should not be a big problem for this.
Old (wrong answer), but should be useful for people who want to connect the components of the chart.
You can use the igraph package to turn the adjacency matrix into a network and return components. Your sample graph is one component, so I removed one edge for illustration.
library(igraph) mat = rbind(c(1,0,0,0,0), c(1,0,0,1,0), c(0,0,1,0,0), c(0,0,0,0,0), c(1,1,1,1,1)) g <- graph.adjacency(mat) %>% delete_edges("5|3") plot(g) clu <- components(g) groups(clu)
The final line is then returned at the command prompt:
> groups(clu) $`1` [1] 1 2 4 5 $`2` [1] 3
My experience with this algorithm is pretty fast - so I donβt think 2000 to 2000 would be a problem.