How to create variables in a dataframe in for loop?

I have an R framework called mydata that counts people with a certain age and a certain height. Thus, within the data, I have the variables mydata$ageto10 (= the number of people under the age of ten), mydata$ageto20 (= the number of people under the age of twenty), etc. Ages 35, 42, and 65. The same goes for height (and several other variables).

I want to create new variables that refer to the number of people aged 10 to 25 years, the age range from 25 to 35, from 35 to 42 and from 42 to 65. Therefore, for the first case, I want:

 mydata$age10to25 <- mydata$ageto25 - mydata$ageto10 

This works, but I want to do it in all ranges and do the same for height and other variables. There should be an easier way than copying these 40 times and changing variable names manually! :)

I thought it should be something like this:

 for (i in c("age", "height")) { for (k in c(10,20,35,42, 65)) { assign(paste("mydata$", i, k, "to", <<next k here>>, sep=""), get(paste("mydata$", i, <<next k here>>, , sep="")) - get(paste("mydata$", i, k, , sep="")) } } 

But obviously this does not work (even if I fill in k manually, it seems that the assign command is not intended to assign variable names to current data.

What is the best way to do this?

+4
source share
1 answer

I assume that you are a refugee from another statistical package (perhaps stata or SAS ). You cannot use assignment to assign columns with $ and paste . In general, if you use assign for a standard task, you are doing something that is not R idiomatically, or there are better solutions.

sort of

 lower <- c(10,25,35,42) upper <- c(25,35,42,65) # create the differences newData <- myData[,paste0('ageto',upper)] - myData[, paste0('ageto',lower)] # name them with valid names (not starting with numbers names(newData) <- paste0('from',lower,'to',upper) # add as columns to the original myData <- cbind(myData, newData) 

No cycles required!

+6
source

Source: https://habr.com/ru/post/1443108/


All Articles