Retrieving the first and last positions in a dataset

Question

Retrieving the first and last positions in a dataset

I have this data set that I am trying to convert in order to get the "from" and "to" positions within a specific grouping of data points that pass the test.

Here's what the data looks like:

pos <- seq(from = 10, to = 100, by = 10) test <- c(1, 1, 1, 0, 0, 0, 1, 1, 1, 0) df <- data.frame(pos, test)

So, you can see that positions 10, 20 and 30, as well as 70, 80 and 90 pass the test (b / c test = 1), but the rest of the points do not. The answer I'm looking for will be a data frame that looks something like a “response data frame” in the code below:

 peaknum <- c(1, 2) from <- c(10, 70) to <- c(30, 90) answer <- data.frame(peaknum, from, to)

Any suggestions on how I can convert the dataset? I'm at a dead end.

Thanks Steve

+5

r dplyr

Steven Mar 17 '16 at 19:58

source share

2 answers

We can do this with dplyr , although node separation is a bit ugly:

 library(dplyr) df %>% group_by(peaknum = rep(seq(rle(test)[['lengths']]), rle(test)[['lengths']])) %>% filter(test == 1) %>% summarise(from = min(pos), to = max(pos)) %>% mutate(peaknum = seq_along(peaknum)) # Source: local data frame [2 x 3] # peaknum from to # (int) (dbl) (dbl) # 1 1 10 30 # 2 2 70 90

What does he do:

the first group_by uses rle to add a column that is a sequence of repeating numbers in test , and groups it for summarise later;
filter breaks lines to where test is 1
summarise collapses the groups and adds max and min for each,
and finally mutate clears the peaknum numbering.

+3

alistaire Mar 17 '16 at 20:03

source share

akrun · Accepted Answer · 2016-03-17T20:05:43+0000

We can use data.table . Use the rleid function to create run-length group identifiers ('peaknum') based on contiguous values that are the same “test”. Using "peaknum" as a grouping variable, we get "min" and "max" of pos, specifying "i" as "test == 1" for a subset of the lines. If necessary, the values of "peaknum" can be changed to a sequence ("seq_len (.N)").

 library(data.table) setDT(df)[, peaknum:= rleid(test)][test==1, list(from=min(pos), to=max(pos)) ,peaknum][, peaknum:= seq_len(.N)] # peaknum from to #1: 1 10 30 #2: 2 70 90

Retrieving the first and last positions in a dataset

More articles: