Find the beginning and end of ranges where the data is uppercase

I have data.frame ystr:

    v1
1    a
2    B
3    B
4    C
5    d
6    a
7    B
8    D

I want to find the beginning and end of each group of letters in CAPS, so my output will be as follows:

    groupId startPos    endPos
1   1       2           4
2   2       7           8

I was able to do this with a for loop by looking at each element in order and comparing it with the previous one as follows:

currentGroupId <-0

for (i in 1:length(ystr[,1])){ 
  if (grepl("[[:upper:]]", ystr[i,])) 
  { 
    if (startCounter == 0) 
    {
       currentGroupId <- currentGroupId +1
       startCounter <-1 
       mygroups[currentGroupId,] <- c(currentGroupId, i, 0)
    }
  }else if (startCounter == 1){
    startCounter <-0
    mygroups[currentGroupId,3]<- i-1
  }
}

Is there an easy way to do this in R?

This may be similar to Mark the beginning and end of groups , but I could not understand how this applies in this case.

+4
source share
2 answers

, (rle) , , , , .

with(rle(d[,1] == toupper(d[,1])),
     data.frame(start=cumsum(lengths)[values]-lengths[values]+1,
                end=cumsum(lengths)[values]))
#   start end
# 1     2   4
# 2     7   8

rle, , .

:

d <- data.frame(v1=c("a", "B", "B", "C", "d", "a", "B", "D"))
+8

IRanges. .

d <- data.frame(v1=c("a", "B", "B", "C", "d", "a", "B", "D"))
d.idx <- which(d$v1 %in% LETTERS)
d.idx
# [1] 2 3 4 7 8

library(IRanges)
d.idx.ir <- IRanges(d.idx, d.idx)
reduce(d.idx.ir)
# IRanges of length 2
#     start end width
# [1]     2   4     3
# [2]     7   8     2
+2

Source: https://habr.com/ru/post/1621044/


All Articles