R lists duplicates in a data frame with a unique value

I have a data framework containing a set of parts and test results. Parts are tested at 3 sites (North Center and South). Sometimes these parts are rechecked. I want, in the end, to create several diagrams that compare the results from the first time, when the part was tested with the second (or third, etc.) Time that was checked, for example. to view the repeatability of the tester.

As an example, I have given the code below. I explicitly deleted the Experiment column from the morley dataset, as this is the column I'm really trying to recreate. The code works, however it seems that there should be a more elegant way to approach this problem. Any thoughts?

Edit - I understand that the above example was too simplistic for my real needs (I tried to create a reproducible example as simple as possible).

New example:

part<-as.factor(c("A","A","A","B","B","B","A","A","A","C","C","C"))
site<-as.factor(c("N","C","S","C","N","S","N","C","S","N","S","C"))
result<-c(17,20,25,51,50,49,43,45,47,52,51,56)

data<-data.frame(part,site,result)
data$index<-1
repeat {
    if(!anyDuplicated(data[,c("part","site","index")]))
    { break }
    data$index<-ifelse(duplicated(data[,1:2]),data$index+1,data$index)
}
data

      part site result index
1     A    N     17     1
2     A    C     20     1
3     A    S     25     1
4     B    C     51     1
5     B    N     50     1
6     B    S     49     1
7     A    N     43     2
8     A    C     45     2
9     A    S     47     2
10    C    N     52     1
11    C    S     51     1
12    C    C     56     1

Old example:

#Generate a trial data frame from the morley dataset
df<-morley[,c(2,3)]

#Set up an iterative variable
#Create the index column and initialise to 1
df$index<-1

# Loop through the dataframe looking for duplicate pairs of
# Runs and Indices and increment the index if it a duplicate
repeat {
    if(!anyDuplicated(df[,c(1,3)]))
    { break }
    df$index<-ifelse(duplicated(df[,c(1,3)]),df$index+1,df$index)
}

# Check - The below vector should all be true
df$index==morley$Expt
+4
source share
3 answers

We can use diffand cumsumin the "Run" to achieve the expected results. In this method, we do not create a column from 1s ie 'index', and also assume that the sequence in "Run" is ordered, as shown in the OP example.

indx <- cumsum(c(TRUE,diff(df$Run)<0))
identical(indx, morley$Expt)
#[1] TRUE

Or we can use ave

indx2 <- with(df, ave(Run, Run, FUN=seq_along))
identical(indx2, morley$Expt)
#[1] TRUE

Update

Using a new example

with(data, ave(seq_along(part), part, site, FUN=seq_along))
#[1] 1 1 1 1 1 1 2 2 2 1 1 1

Or we can use getanIDfromlibrary(splitstackshape)

library(splitstackshape)
getanID(data, c('part', 'site'))[]
+3
source

data.frame . :

#this works if each group starts with 1:
df$index<-cumsum(df$Run==1)
#this is maybe more general, with data.table
require(data.table)
dt<-as.data.table(df)
dt[,index:=seq_along(Speed),by=Run]
+2

, make.unique, .

index <- 1L + as.integer(sub("\\d+(\\.)?","",make.unique(as.character(morley$Run))))
index <- ifelse(is.na(index),1L,index)
identical(index,morley$Expt)
[1] TRUE
+2

Source: https://habr.com/ru/post/1605908/


All Articles