Here is my little example: ...........
Mark <- paste ("SN", 1:400, sep = "") highway <- rep(1:4, each = 100) set.seed (1234) MAF <- rnorm (400, 0.3, 0.1) PPC <- abs (ceiling( rnorm (400, 5, 5))) set.seed (1234) Position <- round(c(cumsum (rnorm (100, 5, 3)), cumsum (rnorm (100, 10, 3)), cumsum (rnorm (100, 8, 3)), cumsum (rnorm (100, 6, 3))), 1) mydf <- data.frame (Mark, highway, Position, MAF, PPC)
I want to filter data that is less than 10 for PPC at a point in time greater than 0.3 for MAF.
# filter PPC < 10 & MAF > 0.3 filtered <- mydf[mydf$PPC < 10 & mydf$MAF > 0.3,]
I have a variable grouping - highway, and each Mark has a Position on the highway. For example, highway 1 for the first five signs:
1.4 7.2 15.5 13.4 19.7 |-----|.......|.......|.....|.....| "SN1" "SN2" "SN3" "SN4" "SN5"
Now I want to select any marks of ~ 30 so that they are well distributed on each route based on the position on each route (take into account the different length of the highway), and the minimum distance between the two choices is at least 10.
Edit: idea (rough sketch)
I might think a little about how to resolve this issue. Help evaluate.
Editing: Here I can find out:
# The maximum (length) of each highway is: out <- tapply(mydf$Position, mydf$highway, max) out 1 2 3 4 453.0 1012.4 846.4 597.6 min(out) [1] 453
For highway 1, the theoretical signs may be:
96.98, 96.98+ 96.98, 96.98+96.98+ 96.98, ........till it is less than maximum (length )for highway 1.
Thus, theoretically, we need to select a mark every 96.98. But the marks on the highway may not coincide with
Note: the total mark selection result should not be exactly 30 (about 30)