I have some data containing more than 400 columns and ~ 80 observations. I would like to use a for loop to traverse each column, and if it contains the desired prefix exp_, I would like to create a new column, which is a value separated by a reference column, saved as the same name, but with a suffix _pp. I would also like to do something else if with a different prefix rev_, but I think that as long as I get the first problem, I can solve everything else myself. The following are some sample data:
exp_alpha exp_bravo rev_charlie rev_delta pupils
10 28 38 95 2
24 56 39 24 5
94 50 95 45 3
15 93 72 83 9
72 66 10 12 3
The first time I tried this, the loop went right, but only kept the last column in which the if statement was true, instead of storing every column in which the if statement was true. I made some settings and lost this code, but now it works without errors, but does not change the data structure.
for (i in colnames(test)) {
if(grepl("exp_", colnames(test)[i])) {
test[paste(i,"pp", sep="_")] <- test[i] / test$pupils)
}
}
My understanding of what this does is:
- loop through column vector
- if the substring "exp_" is in the ith element of the vector colnames == TRUE
- create a new column in the dataset, which is the ith element of the colnames vector, divided by the reference category (pupils), and with the addition of "_pp" at the end
- nothing to do
, , , , if(), , . "== TRUE" if(), .