Create a variable that identifies the original data.frame after the rbind command in R

I'm relatively new to R, and I would like to know how I can create a variable (sequence of numbers) that identifies each of the original data frames before combining with the rbind command.

Since there is one variable in the original data frames that is the row identifier, when creating a loop that assigns a new number to the new variable every time it encounters number 1 in the row identifier, it should work ..

Thanks.

+6
source share
6 answers

The gdata package has a combine function that does just that.

 df1 <- data.frame(a = seq(1, 5, by = 1), b = seq(21, 25, by = 1)) df2 <- data.frame(a = seq(6, 10, by = 1), b = seq(26, 30, by = 1)) library(gdata) combine(df1, df2) ab source 1 1 21 df1 2 2 22 df1 3 3 23 df1 4 4 24 df1 5 5 25 df1 6 6 26 df2 7 7 27 df2 8 8 28 df2 9 9 29 df2 10 10 30 df2 
+5
source

It looks like bind_rows from the dplyr package will do this too. Maloneypatr example:

 df1 <- data.frame(a = seq(1, 5, by = 1), b = seq(21, 25, by = 1)) df2 <- data.frame(a = seq(6, 10, by = 1), b = seq(26, 30, by = 1)) dplyr::bind_rows(df1, df2, .id = "source") Source: local data frame [10 x 3] # source ab # (chr) (dbl) (dbl) # 1 1 1 21 # 2 1 2 22 # 3 1 3 23 # 4 1 4 24 # 5 1 5 25 # 6 2 6 26 # 7 2 7 27 # 8 2 8 28 # 9 2 9 29 # 10 2 10 30 
+6
source

Why not just:

  rbind( cbind(df1, origin="df1"), cbind(df2, origin='df2') ) 

Or, if you want to keep the names of the growths:

  rbind( cbind(df1, origin=paste("df1",rownames(df1), sep="_") ), cbind(df2, origin=paste("df1",rownames(df1), sep="_") ) ) 
+2
source

you can use

 transform(dat, newCol = cumsum(ID == 1)) 

where dat is the name of your data frame and ID is the name of the identifier column.

+2
source

Pretty extensible solution:

 # test data: df1 <- data.frame(id=letters[1:2]) df2 <- data.frame(id=letters[1:2]) 

Collect your data into a list, then rbind all at once:

 dfs <- c("df1","df2") do.call(rbind, Map("[<-", mget(dfs), TRUE, "source", dfs) ) # id source #df1.1 a df1 #df1.2 b df1 #df2.1 a df2 #df2.2 b df2 

Also note in this example that when you rbind using a named list, your socket names refer to the source data. This means that you can get almost what you want, simply:

 dfs <- c("df1","df2") do.call(rbind, mget(dfs) ) # id #df1.1 a #df1.2 b #df2.1 a #df2.2 b 
+1
source

Thanks everyone! I ended up with a simple solution, working with my friend, creating an index, for example:

 index<-rep(1,times=nrow(data.frame)) for (i in 1:(nrow(data.frame)-1)){ if (data_frame$ID [i+1]<= data.frame$ID[i]) { index[i+1]<-index[i]+1 } else {index[i+1]<-index[i]}} new.data.frame <- cbind(index, data.frame) 
0
source

Source: https://habr.com/ru/post/979076/


All Articles