Reducing the "while loop" with conditions

Question

Reducing the "while loop" with conditions

My goal, in the great scheme of things, is to print only lines that have the same name of the same field without repeating. That is, if three lines are duplicated, print each of them only once (and not each pairwise comparison).

Minimum data set and library to play:

library(stringdist)
trye <-  data.frame(names = c('aa','aa','aa','bb','bb','cc'),
                    values = 1:6,
                    id = c('row 1', 'row 2', 'row 3', 'row 4', 'row 5', 'row 6'), 
                    stringsAsFactors = FALSE)

My expected result will consist of strings that have the same / similar name (1,2,3,4 and 5):

trye 
#   names values    id
# 1    aa      1 row 1
# 2    aa      2 row 2
# 3    aa      3 row 3
# 4    bb      4 row 4
# 5    bb      5 row 5

Here are two attempts that did not work (some other modifications caused errors):

#this one prints row 1,2,3,3,5,5
i <- 1
while (i < length(trye$names)) {

  dupe <- amatch(trye$names[[i]],trye$names[-i], maxDist = 1)

  if(dupe  + 1 > 0) {
    print(trye[i,])
    duperow <- dupe + 1
    print(trye[duperow,])
    trye <- trye[-c(i), ]
    i <- i + 1


  } else {
    i <- i + 1
    trye <- trye[-c(i), ]
  }

}



# this one prints rows 1,2,4,5 which is almost correct,
# it missing row 3 (as it shares the name with row 1 and 2.
i <- 1
while (i < length(trye$names)) {

  dupe <- amatch(trye$names[[i]],trye$names[-i], maxDist = 1)

  if(dupe  + 1 > 0) {
    print(trye[i,])
    duperow <- dupe + 1
    print(trye[duperow,])
    trye <- trye[-c(i,duperow), ]
    i <- i + 1


  } else {
    i <- i + 1
    trye <- trye[-c(i,duperow), ]
  }

}

Note that the actual dataset is huge, so deleting rows to make comparisons smaller seems (or seemed) like a good idea to me, also the maximum distance in the actual dataset is more than 1.

+4

r control-flow

erasmortg 25 . '16 15:37

1

jeremycg · Accepted Answer · 2016-01-25T15:59:55+0000

adist , , ( ):

sapply(1:nrow(trye), function(x) sum(adist(trye[x,1], trye[,1])==0)>1)

, adist , , :

trye[(!duplicated(trye$names) | rev(!duplicated(rev(trye$names)))),]

. openrefine, .

Reducing the "while loop" with conditions

More articles: