Retrieving the first and last positions in a dataset

I have this data set that I am trying to convert in order to get the "from" and "to" positions within a specific grouping of data points that pass the test.

Here's what the data looks like:

pos <- seq(from = 10, to = 100, by = 10) test <- c(1, 1, 1, 0, 0, 0, 1, 1, 1, 0) df <- data.frame(pos, test) 

So, you can see that positions 10, 20 and 30, as well as 70, 80 and 90 pass the test (b / c test = 1), but the rest of the points do not. The answer I'm looking for will be a data frame that looks something like a โ€œresponse data frameโ€ in the code below:

 peaknum <- c(1, 2) from <- c(10, 70) to <- c(30, 90) answer <- data.frame(peaknum, from, to) 

Any suggestions on how I can convert the dataset? I'm at a dead end.

Thanks Steve

+5
source share
2 answers

We can use data.table . Use the rleid function to create run-length group identifiers ('peaknum') based on contiguous values โ€‹โ€‹that are the same โ€œtestโ€. Using "peaknum" as a grouping variable, we get "min" and "max" of pos, specifying "i" as "test == 1" for a subset of the lines. If necessary, the values โ€‹โ€‹of "peaknum" can be changed to a sequence ("seq_len (.N)").

 library(data.table) setDT(df)[, peaknum:= rleid(test)][test==1, list(from=min(pos), to=max(pos)) ,peaknum][, peaknum:= seq_len(.N)] # peaknum from to #1: 1 10 30 #2: 2 70 90 
+5
source

We can do this with dplyr , although node separation is a bit ugly:

 library(dplyr) df %>% group_by(peaknum = rep(seq(rle(test)[['lengths']]), rle(test)[['lengths']])) %>% filter(test == 1) %>% summarise(from = min(pos), to = max(pos)) %>% mutate(peaknum = seq_along(peaknum)) # Source: local data frame [2 x 3] # peaknum from to # (int) (dbl) (dbl) # 1 1 10 30 # 2 2 70 90 

What does he do:

  • the first group_by uses rle to add a column that is a sequence of repeating numbers in test , and groups it for summarise later;
  • filter breaks lines to where test is 1
  • summarise collapses the groups and adds max and min for each,
  • and finally mutate clears the peaknum numbering.
+3
source

Source: https://habr.com/ru/post/1245266/


All Articles