How to apply the application equivalent for any cycle

Most pro R users have advised me never to use loops in R. Use application functions instead. The problem is that it is not intuitive to write an application equivalent for each / while loop if you are not familiar with functional programming. Take for example the example below.

F <- data.frame(name = c("a", "b", "c", "d"), var1 = c(1,0,0,1), var2 = c(0,0,1,1), var3 = c(1,1,1,1), clus = c("one", "two", "three", "four")) F$ObjTrim <- "" for (i in 1:nrow(F)) { for (j in 2:(ncol(F)-1)) { if(F[i, j] == 1) {F$ObjTrim[i] <- paste(F$ObjTrim[i], colnames(F)[j], sep = " ") } } print(i) } 

The goal here is to create the "ObjTrim" variable, which takes the value of all column names that have value == 1. Can anyone suggest a good equivalent to this equivalent?

The above code, for example, will give:

  name var1 var2 var3 clus ObjTrim 1 a 1 0 1 one var1 var3 2 b 0 0 1 two var3 3 c 0 1 1 three var2 var3 4 d 1 1 1 four var1 var2 var3 

Thanks!

+4
source share
3 answers

Here you can avoid for loops using vectorization : colSums , vectorized here, is mainly used to convert the vector c (TRUE, FALSE) to 0 or 1.

  colnames(F)[colSums(F==1) != 0] ## create 

Here is a test using my reproducible example:

 set.seed(1234) ## create matrix 2*10 F <- matrix(sample(c(1:5),20,rep=TRUE),nrow=2, dimnames = list(c('row1','row2'),paste0('col',1:10))) # col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 # row1 1 4 5 1 4 4 2 2 2 1 # row2 4 4 4 2 3 3 5 5 2 2 colnames(F)[colSums(F==1) != 0] "col1" "col4" "col10" 

PS . As a rule, replacing for loops with an โ€œR-stylish solutionโ€ is easy, but there are some cases where it is difficult or impossible to do specifically when there are recursions

EDIT

After clarifying the OP, here apply solution:

 F$ObjTrim <- apply(F,1,function(x) paste(colnames(F)[x==1],collapse=' ')) name var1 var2 var3 clus ObjTrim 1 a 1 0 1 one var1 var3 2 b 0 0 1 two var3 3 c 0 1 1 three var2 var3 4 d 1 1 1 four var1 var2 var3 
+5
source

As your comment on @agstudy's answer says you want this for each line, maybe this will help you:

 df <- F [, 2:4] df # var1 var2 var3 # 1 1 0 1 # 2 0 0 1 # 3 0 1 1 # 4 1 1 1 ones <- which (df == 1, arr.ind=TRUE) ones # row col # [1,] 1 1 # [2,] 4 1 # [3,] 3 2 # [4,] 4 2 # [5,] 1 3 # [6,] 2 3 # [7,] 3 3 # [8,] 4 3 

This can be aggregate by line:

 aggregate (col ~ row, ones, paste) # row col # 1 1 1, 3 # 2 2 3 # 3 3 2, 3 # 4 4 1, 2, 3 

If you insist on having column names instead of columns, first replace the columns with ones :

 ones <- as.data.frame (ones) ones$col <- colnames (df)[ones$col] aggregate (col ~ row, ones, paste) # row col # 1 1 var1, var3 # 2 2 var3 # 3 3 var2, var3 # 4 4 var1, var2, var3 

Of course, you can also use apply along the lines:

 apply (df, 1, function (x) paste (colnames (df) [x == 1], collapse = " ")) # [1] "var1 var3" "var3" "var2 var3" "var1 var2 var3" 

There are vectorized functions for your problem, so no for or apply loops are required.

However, there are times when it is clearer (faster to read) for the loops and sometimes faster to calculate the alternative. This is especially true when the loop allows you to use vectorized functions several times and save apply some other function by a large margin.

+5
source

To answer what seems to be your general question, and not the example you pointed out - how to convert a for loop to an application - the following may be some useful pointers:

  • Consider the structure of the object you are repeating. There may be different types, for example:

    a) Elements of a vector / matrix. b) Rows / columns of the matrix. c) Dimension of a multidimensional array. d) List items (which themselves may be one of the items listed above). e) Relevant elements of multiple lists / vectors.

    In each case, the function you use may be slightly different, but the usage strategy is the same. Also, study the applicable family. The various * pply functions are based on a similar abstraction, but differ in what they take as input and what they throw as output.

  • In the above list of cases, for example.

    a) Elements of a vector: Look for already existing vectorized solutions (as indicated above) that are the main force in R. In addition to this, consider matrix algebra. Most problems that seem to require cycles (or nested cycles) can be written as equations in matrix algebra.

    b) Rows / columns of the matrix: use apply . Use the correct value for the MARGIN argument. Similary for c) for arrays with large sizes.

    d) Use lapply . If you return the result, this is a "simple" structure (scalar or vector), you can consider sapply, which is simply simplify2array(lapply(...)) and returns an array in appropriate sizes.

    e) Use mapply . โ€œMโ€ can stand for multidimensional use.

  • Once you understand the object you are iterating and the corresponding tool, simplify your problem. Think not about the general object that you are repeating, but about one instance of it. For example, when iterating over the rows of a matrix, forget about the matrix and remember only the row.

    Now write a function (or lambda) that works with only one instance (element) of your iterand and simply โ€œapplyโ€ it using the correct member of the * pply family.

Now let's take a look at your sample task to use this strategy and replicate the clean solution given by @agstudy.

  • The first thing to determine is that you iterate through the rows of the matrix. It is clear that you understand this, since your loop solution starts with for (i in 1:nrow(F)) .

  • Define apply as your friend.

  • Understand what you need to do with this line. First of all, you want to know which values โ€‹โ€‹are 1. Then you need to find the colnames of these values. And then find a way to combine these names. If I can afford to rewrite @agstudy's solution to help explain:

     process.row <- function (arow) { ones <- arow == 1 # Returns logical vector. cnames <- colnames[ones] # Logical subsetting. cnames <- paste(cnames, collapse=' ') # Paste the names together. cnames # Return } 

    And you will get a solution:

     F$ObjTrim = apply(X=F, MARGIN=1, FUN=process.row) 

    Then, when such thinking becomes instinctive, you can use the R function to write dense expressions, such as:

     F$ObjTrim = apply(F,1,function(x) paste(colnames(F)[x==1],collapse=' ')) 

which uses a lambda rolled up on the fly to complete the task.

+4
source

Source: https://habr.com/ru/post/1485262/


All Articles