A unique combination of all elements from two (or more) vectors

I am trying to create a unique combination of all elements from two vectors of different sizes in R.

For example, the first vector

> a <- c("ABC", "DEF", "GHI") 

and the second is the dates currently stored as

 > b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05") 

I need to create a data frame with two columns like this

 > data ab 1 ABC 2012-05-01 2 ABC 2012-05-02 3 ABC 2012-05-03 4 ABC 2012-05-04 5 ABC 2012-05-05 6 DEF 2012-05-01 7 DEF 2012-05-02 8 DEF 2012-05-03 9 DEF 2012-05-04 10 DEF 2012-05-05 11 GHI 2012-05-01 12 GHI 2012-05-02 13 GHI 2012-05-03 14 GHI 2012-05-04 15 GHI 2012-05-05 

So, basically, I'm looking for a unique combination, considering all the elements of one vector (a), compared with all the elements of the second vector (b).

An ideal solution will generalize to a larger number of input vectors.




See also:
How to create a combination matrix

+70
r r-faq
Jul 09 '12 at 2:10
source share
4 answers

is it possible that you after

 > expand.grid(a,b) Var1 Var2 1 ABC 2012-05-01 2 DEF 2012-05-01 3 GHI 2012-05-01 4 ABC 2012-05-02 5 DEF 2012-05-02 6 GHI 2012-05-02 7 ABC 2012-05-03 8 DEF 2012-05-03 9 GHI 2012-05-03 10 ABC 2012-05-04 11 DEF 2012-05-04 12 GHI 2012-05-04 13 ABC 2012-05-05 14 DEF 2012-05-05 15 GHI 2012-05-05 

If the order received is not what you want, you can sort it later. If you specify expand.grid arguments, they will become column names:

 df = expand.grid(a = a, b = b) df[order(df$a), ] 

And expand.grid generalizes to any number of input columns.

+110
Jul 09 2018-12-12T00:
source share
— -

The tidyr package provides a nice alternative crossing that works better than the classic expand.grid function because (1) rows are not converted to factors and (2) sorting is more intuitive:

 library(tidyr) a <- c("ABC", "DEF", "GHI") b <- c("2012-05-01", "2012-05-02", "2012-05-03", "2012-05-04", "2012-05-05") crossing(a, b) # A tibble: 15 x 2 ab <chr> <chr> 1 ABC 2012-05-01 2 ABC 2012-05-02 3 ABC 2012-05-03 4 ABC 2012-05-04 5 ABC 2012-05-05 6 DEF 2012-05-01 7 DEF 2012-05-02 8 DEF 2012-05-03 9 DEF 2012-05-04 10 DEF 2012-05-05 11 GHI 2012-05-01 12 GHI 2012-05-02 13 GHI 2012-05-03 14 GHI 2012-05-04 15 GHI 2012-05-05 
+10
Jun 20 '18 at 21:37
source share

You can use the order function to sort any number of columns. for your example

 df <- expand.grid(a,b) > df Var1 Var2 1 ABC 2012-05-01 2 DEF 2012-05-01 3 GHI 2012-05-01 4 ABC 2012-05-02 5 DEF 2012-05-02 6 GHI 2012-05-02 7 ABC 2012-05-03 8 DEF 2012-05-03 9 GHI 2012-05-03 10 ABC 2012-05-04 11 DEF 2012-05-04 12 GHI 2012-05-04 13 ABC 2012-05-05 14 DEF 2012-05-05 15 GHI 2012-05-05 > df[order( df[,1], df[,2] ),] Var1 Var2 1 ABC 2012-05-01 4 ABC 2012-05-02 7 ABC 2012-05-03 10 ABC 2012-05-04 13 ABC 2012-05-05 2 DEF 2012-05-01 5 DEF 2012-05-02 8 DEF 2012-05-03 11 DEF 2012-05-04 14 DEF 2012-05-05 3 GHI 2012-05-01 6 GHI 2012-05-02 9 GHI 2012-05-03 12 GHI 2012-05-04 15 GHI 2012-05-05' 
+1
Jun 03 '18 at 18:32
source share

There is no CJ -function from data.table -package in this review. Via:

 library(data.table) CJ(a = a, b = b, unique = TRUE) 

gives:

  ab 1: ABC 2012-05-01 2: ABC 2012-05-02 3: ABC 2012-05-03 4: ABC 2012-05-04 5: ABC 2012-05-05 6: DEF 2012-05-01 7: DEF 2012-05-02 8: DEF 2012-05-03 9: DEF 2012-05-04 10: DEF 2012-05-05 11: GHI 2012-05-01 12: GHI 2012-05-02 13: GHI 2012-05-03 14: GHI 2012-05-04 15: GHI 2012-05-05 

In the latest version of data.table, you can simply use: CJ(a, b, unique = TRUE)

0
Jan 29 '19 at 8:50
source share



All Articles