Combining List Lines in R

I have a format list:

[[1]]
 [1] "10"  "719" "99"  

[[2]]
 [1] "10"  "624" "85"  "888" "624" 

[[3]]
 [1] "1"   "894" "110" "344" "634"  

I want to combine by the unique value of the first element in the list, i.e.

[[1]]
 [1] "10"  "719" "99" "624" "85"  "888" "624" 

[[2]]
 [1] "1"   "894" "110" "344" "634"

Is there a way to do this with the least memory usage?

+4
source share
2 answers

I would approach this as follows:

x <- list(c("10",  "719", "99"),
          c("10",  "624", "85",  "888", "624"),
          c("1",   "894", "110", "344", "634"))
first_elems <- sapply(x, "[", 1) # get 1st elem of each vector
(first_elems <- as.factor(first_elems)) # factorize (i.a. find all unique elems)
## [1] 10 10 1 
## Levels: 1 10
(group <- split(x, first_elems)) # split by 1st elem (divide into groups)
## $`1`
## $`1`[[1]]
## [1] "1"   "894" "110" "344" "634"
## 
## 
## $`10`
## $`10`[[1]]
## [1] "10"  "719" "99" 
## 
## $`10`[[2]]
## [1] "10"  "624" "85"  "888" "624"
## 
(result <- lapply(group, unlist)) # combine vectors in each group (list of vectors -> an atomic vector)
## $`1`
## [1] "1"   "894" "110" "344" "634"
## 
## $`10`
## [1] "10"  "719" "99"  "10"  "624" "85"  "888" "624"

EDIT . For non-duplicated keys, use:

(result <- lapply(group, function(x) {
      c(x[[1]][1], unlist(lapply(x, "[", -1)))
   }))
## $`1`
## [1] "1"   "894" "110" "344" "634"
## 
## $`10`
## [1] "10"  "719" "99"  "624" "85"  "888" "624"

No extra memory required. With the exception of the resulting list, we need to save the results as.factor(number of classes + number of elements in x). splitrequires a little extra mem - vectors in xare not deeply copied.

In terms of performance, for a fairly large list:

set.seed(1L)
n <- 100000
x <- vector('list', n)
for (i in 1:n)
   x[[i]] <- as.character(sample(1:1000, ceiling(runif(1, 1, 1000)), replace=TRUE))
object.size(x) # 2GB
## 2175165880 bytes

Linux- :

system.time(local({
   first_elems <- as.factor(sapply(x, "[", 1))
   group <- split(x, first_elems)
   result <- lapply(group, function(x) {
     c(x[[1]][1], unlist(lapply(x, "[", -1)))
   })
}))

##    user  system elapsed 
##   4.119   0.001   4.149 

, .

+2

, for ( ) , .

x <- list(c("10",  "719", "99"),
          c("10",  "624", "85" , "888", "624"),
          c("1",   "894", "110", "344", "634"))  

y <- vector('list', length(x)) # allocate a list at least as long as x

for(i in 2:length(x)){
  if((x[[i-1]] %in% x[[i]])[1]){
    y[[i-1]] <- c(unlist(x[[i-1]]), unlist(x[[i]][-1]))
  } else {
    y[[i-1]] <- x[[i]]
  }
}

z <- y[!sapply(y, is.null)]
z
# [[1]]
# [1] "10"  "719" "99"  "624" "85"  "888" "624"
# 
# [[2]]
# [1] "1"   "894" "110" "344" "634"
0

Source: https://habr.com/ru/post/1540136/


All Articles