Loop to create a dummy variable R

I am trying to create dummy variables (should be 1/0) using a loop based on the most common variable response. After many searches, I could not find a solution. I extracted the most common answers (lines, for example, the top 5 - "A", "B", ..., "E") using

top5<-names(head(sort(table(data$var1), decreasing = TRUE),5)

I would like the loop to check if another variable ("var2") A is equal, if it is set = 1, OW = 0, then give a summary using aggregate (). In Stata, I can refer to the looped variable i using `i ', but not in R ... Code that doesn't work:

 for(i in top5) { data$i.dummy <- ifelse(data$var2=="i",1,0) aggregate(data$i.dummy~data$age+data$year,data,mean) } 

Any suggestions?

+4
source share
2 answers

If you need one column for each element in the top 5, I would use sapply on the elements in top5 . There is no need for ifelse because == compares and gives TRUE or 1 if comparing TRUE and 0 otherwise

Here we bind a matrix of 5 columns, one for each element from top5 containing 1 if the row in data$var2 is equal to the corresponding element 'top5':

 data <- cbind( data , sapply( top5 , function(x) as.integer( data$var2 == x ) ) ) 

If you need another column to match any of top5 :

 data$dummies <- as.integer( data$var2 %in% top5 ) 

as.integer() in both cases to turn TRUE or FALSE into 1 and 0 respectively.

A shortened example to illustrate how it works:

 set.seed(123) top2 <- c("A","B") data <- data.frame( var2 = sample(LETTERS[1:4],6,repl=TRUE) ) # Make dummy variables, one column for each element in topX vector data <- cbind( data , sapply( top2 , function(x) as.integer( data$var2 == x ) ) ) data # var2 AB #1 B 0 1 #2 D 0 0 #3 B 0 1 #4 D 0 0 #5 D 0 0 #6 A 1 0 # Make single column for all elements in topX vector data$ANY <- as.integer( data$var2 %in% top2 ) data # var2 ANY AB #1 B 1 0 1 #2 D 0 0 0 #3 B 1 0 1 #4 D 0 0 0 #5 D 0 0 0 #6 A 1 1 0 
+4
source

See fortune(312) , then read the help ?"[[" And possibly the help for paste0 .

Then perhaps consider using other tools like model.matrix and sapply , rather than everything using loops.

+5
source

Source: https://habr.com/ru/post/1485851/


All Articles