Create a variable that identifies the original data.frame after the rbind command in R

Question

Create a variable that identifies the original data.frame after the rbind command in R

I'm relatively new to R, and I would like to know how I can create a variable (sequence of numbers) that identifies each of the original data frames before combining with the rbind command.

Since there is one variable in the original data frames that is the row identifier, when creating a loop that assigns a new number to the new variable every time it encounters number 1 in the row identifier, it should work ..

Thanks.

+6

loops r rbind

Francisco beca Dec 03 '14 at 22:18

source share

6 answers

It looks like bind_rows from the dplyr package will do this too. Maloneypatr example:

 df1 <- data.frame(a = seq(1, 5, by = 1), b = seq(21, 25, by = 1)) df2 <- data.frame(a = seq(6, 10, by = 1), b = seq(26, 30, by = 1)) dplyr::bind_rows(df1, df2, .id = "source") Source: local data frame [10 x 3] # source ab # (chr) (dbl) (dbl) # 1 1 1 21 # 2 1 2 22 # 3 1 3 23 # 4 1 4 24 # 5 1 5 25 # 6 2 6 26 # 7 2 7 27 # 8 2 8 28 # 9 2 9 29 # 10 2 10 30

+6

Jake fisher Mar 15 '16 at 15:37

source share

Why not just:

  rbind( cbind(df1, origin="df1"), cbind(df2, origin='df2') )

Or, if you want to keep the names of the growths:

  rbind( cbind(df1, origin=paste("df1",rownames(df1), sep="_") ), cbind(df2, origin=paste("df1",rownames(df1), sep="_") ) )

+2

42- Dec 03 '14 at 22:22

source share

you can use

 transform(dat, newCol = cumsum(ID == 1))

where dat is the name of your data frame and ID is the name of the identifier column.

+2

Sven hohenstein Dec 03 '14 at 22:25

source share

Pretty extensible solution:

 # test data: df1 <- data.frame(id=letters[1:2]) df2 <- data.frame(id=letters[1:2])

Collect your data into a list, then rbind all at once:

 dfs <- c("df1","df2") do.call(rbind, Map("[<-", mget(dfs), TRUE, "source", dfs) ) # id source #df1.1 a df1 #df1.2 b df1 #df2.1 a df2 #df2.2 b df2

Also note in this example that when you rbind using a named list, your socket names refer to the source data. This means that you can get almost what you want, simply:

 dfs <- c("df1","df2") do.call(rbind, mget(dfs) ) # id #df1.1 a #df1.2 b #df2.1 a #df2.2 b

+1

thelatemail Dec 03 '14 at 10:54

source share

Thanks everyone! I ended up with a simple solution, working with my friend, creating an index, for example:

 index<-rep(1,times=nrow(data.frame)) for (i in 1:(nrow(data.frame)-1)){ if (data_frame$ID [i+1]<= data.frame$ID[i]) { index[i+1]<-index[i]+1 } else {index[i+1]<-index[i]}} new.data.frame <- cbind(index, data.frame)

0

Francisco beca Dec 04 '14 at 17:01

source share

maloneypatr · Accepted Answer · 2014-12-03T22:49:27+0000

The gdata package has a combine function that does just that.

 df1 <- data.frame(a = seq(1, 5, by = 1), b = seq(21, 25, by = 1)) df2 <- data.frame(a = seq(6, 10, by = 1), b = seq(26, 30, by = 1)) library(gdata) combine(df1, df2) ab source 1 1 21 df1 2 2 22 df1 3 3 23 df1 4 4 24 df1 5 5 25 df1 6 6 26 df2 7 7 27 df2 8 8 28 df2 9 9 29 df2 10 10 30 df2

Create a variable that identifies the original data.frame after the rbind command in R

More articles: