R, taking into account the appearance of similar rows of the data frame

Question

R, taking into account the appearance of similar rows of the data frame

I have data in the following DF format (this is just a simplified version):

eval.num, eval.count, fitness, fitness.mean, green.h.0, green.v.0, offset.0 random
1         1           1500     1500          100        120        40       232342
2         2           1000     1250          100        120        40       11843
3         3           1250     1250          100        120        40       981340234
4         4           1000     1187.5        100        120        40       4363453
5         1           2000     2000          200        100        40       345902
6         1           3000     3000          150        90         10       943
7         1           2000     2000          90         90         100      9304358
8         2           1800     1900          90         90         100      284333

However, the eval.count column is incorrect and I need to fix it. It should report the number of lines with the same values for (green.h.0, green.v.0 and offset.0), only by looking at the previous lines.

The above example uses expected values, but assumes they are incorrect.

How to add a new column (say, "count") that will count all previous rows that have the same values for the specified variables?

I got help on a similar problem by simply selecting all rows with the same values for the specified columns, so I assumed that I could just write a loop around this, but it seems inefficient.

+3

r duplicates count dataframe

Matt 03 . '10 20:25

3

Jonathan Chang · Answer 1 · 2010-04-03T21:52:10+0000

, , .

> data <- rep(sample(1000, 5),
              sample(5, 5))
> head(data)
[1] 435 435 435 278 278 278

rle :

> sequence(rle(data)$lengths)
[1] 1 2 3 1 2 3 4 5 1 2 3 4 1 2 1

:

> head(cbind(data, sequence(rle(data)$lengths)))
[1,]  435 1
[2,]  435 2
[3,]  435 3
[4,]  278 1
[5,]  278 2
[6,]  278 3

, , . paste , , .

Matt · Answer 2 · 2010-04-03T22:36:29+0000

, , , , , , . , :

cmpfun2 <- function(r) {
    count <- 0
    if (r[1] > 1)
    {
        for (row in 1:(r[1]-1))
        {
            if(all(r[27:51] == DF[row,27:51,drop=FALSE]))  # compare to row bind
            {
                count <- count + 1
            }
        }
    }
    return (count)
}
brows <- apply(DF[], 1, cmpfun2)
print(brows)

, , , , . !

Matt · Answer 3 · 2011-02-16T16:36:43+0000

, (, )

checkIt <- function(bind) {

    print(bind)

    cmpfun <- function(r) {all(r == heeds.data[bind,23:47,drop=FALSE])}
    brows <- apply(heeds.data[,23:47], 1, cmpfun)

    #print(heeds.data[brows,c("eval.num","fitness","green.h.1","green.h.2","green.v.5")])
    print(nrow(heeds.data[brows,c("eval.num","fitness","green.h.1","green.h.2","green.v.5")]))
}

, heeds.data - , , , ( ). , 23:47 - , .

, R, , .

, !

R, taking into account the appearance of similar rows of the data frame

More articles: