Loop, create a new variable as a function of an existing variable with a conditional

I have some data containing more than 400 columns and ~ 80 observations. I would like to use a for loop to traverse each column, and if it contains the desired prefix exp_, I would like to create a new column, which is a value separated by a reference column, saved as the same name, but with a suffix _pp. I would also like to do something else if with a different prefix rev_, but I think that as long as I get the first problem, I can solve everything else myself. The following are some sample data:

exp_alpha     exp_bravo    rev_charlie     rev_delta     pupils
10            28           38              95            2
24            56           39              24            5
94            50           95              45            3
15            93           72              83            9
72            66           10              12            3

The first time I tried this, the loop went right, but only kept the last column in which the if statement was true, instead of storing every column in which the if statement was true. I made some settings and lost this code, but now it works without errors, but does not change the data structure.

for (i in colnames(test)) {
  if(grepl("exp_", colnames(test)[i])) {
    test[paste(i,"pp", sep="_")] <- test[i] / test$pupils)
  }
}

My understanding of what this does is:

  • loop through column vector
  • if the substring "exp_" is in the ith element of the vector colnames == TRUE
  • create a new column in the dataset, which is the ith element of the colnames vector, divided by the reference category (pupils), and with the addition of "_pp" at the end
  • nothing to do

, , , , if(), , . "== TRUE" if(), .

+4
3

, , . :

for (i in 1:length(colnames(test))) {
  if(grepl("exp_", colnames(test)[i])) {
  test[paste(i,"pp", sep="_")] <- test[i] / test$pupils
  }
}
+2

@timfaber , i :

for (i in colnames(test)) {
  if(grepl("exp_", i)) {
    print(i)
    test[paste(i,"pp", sep="_")] <- test[i] / test$pupils
  }
}
+2

:

! , . :

# Extract column names
cNames <- colnames(test)
# Find exp in column names
foo <- grep("exp", cNames)
# Divide by reference: ALL columns at the SAME time
bar <- test[, foo] / test$pupils
# Rename exp to pp : ALL columns at the SAME time
colnames(bar) <- gsub("exp", "pp", cNames[foo])
# Add to original dataset instead of iteratively appending 
cbind(test, bar)
+1

Source: https://habr.com/ru/post/1675818/


All Articles