All letter / number combinations under certain conditions

I created these vectors:

Letters <- c("A","C","E","G","H","J","K") Numbers <- c(0,1,2,3,4,6,7,9) AlphaNumeric <- c(Letters, Numbers) 

I would like to get a data block of all three-element combinations (for example, AA1, G26, etc.), using all the elements mentioned above, the following three conditions:

1.) The first element is the letter

2.) The second element is the number or SAME letter as the first element

3.) The third element is the number

Approach: I tried using expand.grid() and was able to successfully get ALL combinations with three elements. Then I tried expand.grid(x = Letters, y = AlphaNumeric, z = Numbers) and was able to reach 1.) and 3.), but could not control 2.).

Unsatisfactory solution: I figured out a way to do this with a for loop, but I think there should be a way to simplify it, except:

  LNN <- expand.grid(x = Letters, y = Numbers, z = Numbers) for ( Element in Letters) { currentLLN <- expand.grid(x = Element, y = Element, z = Numbers) LNN <- merge(LNN, currentLLN, all = TRUE)} 

Any help would be greatly appreciated, thanks Christian

+5
source share
4 answers

Just multiply your expand.grid() channel with expand.grid() calls:

 df <- expand.grid(x = Letters, y = AlphaNumeric, z = Numbers, stringsAsFactors = FALSE) sub <- subset(df, (x == y | grepl("[0-9]", y)) ) sub <- with(sub, sub[order(x, y, z),]) # SORT DATAFRAME rownames(sub) <- NULL # RESET ROWNAMES head(sub, 10) # xyz # 1 A 0 0 # 2 A 0 1 # 3 A 0 2 # 4 A 0 3 # 5 A 0 4 # 6 A 0 6 # 7 A 0 7 # 8 A 0 9 # 9 A 1 0 
+3
source

You can create two data frames: one where the second element is a number, and the second is the same as the first element, and then rbind those. The example is below, please note that I have limited your example data to illustrate.

 Letters <- LETTERS[1:3] Numbers <- c(1,2) df1 = expand.grid(v1=Letters,v3=Numbers,stringsAsFactors = F) df1$v2 = df1$v1 df1 = df1[,c('v1','v2','v3')] df2 = expand.grid(v1=Letters,v2=as.character(Numbers),v3=Numbers, stringsAsFactors = F) df = rbind(df1,df2) 

Output:

 > df v1 v2 v3 1 AA 1 2 BB 1 3 CC 1 4 AA 2 5 BB 2 6 CC 2 7 A 1 1 8 B 1 1 9 C 1 1 10 A 2 1 11 B 2 1 12 C 2 1 13 A 1 2 14 B 1 2 15 C 1 2 16 A 2 2 17 B 2 2 18 C 2 2 

Hope this helps!


Although both answers are very fast and Parfait's solution is a good solution to your problem, and I certainly don't want to discredit his answer, I find it useful to note that creating additional combinations and subsets will become a more serious problem if your data is larger. The following is a comparative test.

 Letters <- c(LETTERS[1:26],letters[1:4]) Numbers <- seq(30) AlphaNumeric <- c(Letters, Numbers) f_flo <- function() { df1 = expand.grid(v1=Letters,v3=Numbers,stringsAsFactors = F) df1$v2 = df1$v1 df1 = df1[,c('v1','v2','v3')] df2 = expand.grid(v1=Letters,v2=as.character(Numbers),v3=Numbers, stringsAsFactors = F) df = rbind(df1,df2) } f_parfait <- function() { df <- expand.grid(x = Letters, y = AlphaNumeric, z = Numbers, stringsAsFactors = FALSE) sub <- subset(df, (x == y | grepl("[0-9]", y)) & grepl("[0-9]", z) ) sub <- with(sub, sub[order(x, y, z),]) # SORT DATAFRAME rownames(sub) <- NULL # RESET ROWNAMES } library(dplyr) one_letter <- function(l) { expand.grid(l, c(l, Numbers), Numbers, stringsAsFactors = FALSE) } f_stibu <- function(){ df <- bind_rows(lapply(Letters, one_letter)) } library(microbenchmark) library(ggplot2) run_times = microbenchmark(f_flo(),f_parfait(),f_stibu()) autoplot(run_times) 

Results:

 Unit: milliseconds expr min lq mean median uq max neval cld f_flo() 1.900719 2.047591 3.666935 2.314258 3.922053 78.74793 100 a f_parfait() 138.028364 142.529904 152.876116 144.159444 146.835958 246.92318 100 b f_stibu() 4.130464 4.333130 5.169664 4.585028 6.209233 10.23139 100 a 

enter image description here

+6
source

For one letter, the problem is easy to solve: the second column is a letter or any number, and the third column is a number:

 one_letter <- function(l) { expand.grid(l, c(l, Numbers), Numbers, stringsAsFactors = FALSE) } 

Then you simply apply this function to each of the letters and combine the resulting data frames into one:

 library(dplyr) df <- bind_rows(lapply(Letters, one_letter)) head(df) ## Var1 Var2 Var3 ## 1 AA 0 ## 2 A 0 0 ## 3 A 1 0 ## 4 A 2 0 ## 5 A 3 0 ## 6 A 4 0 

The dplyr package dplyr used because it provides a bind_rows() function that combines a list of data frames into a single data frame.

+3
source

using only the first 3 letters and the first 2 numbers. Then you will get the following results:

 > Numbers=c(0,1) > Letters=c("A","C") > A=outer(Letters,outer(Numbers,Numbers,paste0),paste0) > B=outer(paste0(Letters,Letters),Numbers,paste0) > sort(c(A,B)) [1] "A00" "A01" "A10" "A11" "AA0" "AA1" "C00" "C01" "C10" "C11" "CC0" "CC1" "E00" "E01" "E10" [16] "E11" "EE0" "EE1" 
+1
source

Source: https://habr.com/ru/post/1275542/


All Articles