Conditional filtering / subgrid data in linear distance data in r

Here is my little example: ...........

Mark <- paste ("SN", 1:400, sep = "") highway <- rep(1:4, each = 100) set.seed (1234) MAF <- rnorm (400, 0.3, 0.1) PPC <- abs (ceiling( rnorm (400, 5, 5))) set.seed (1234) Position <- round(c(cumsum (rnorm (100, 5, 3)), cumsum (rnorm (100, 10, 3)), cumsum (rnorm (100, 8, 3)), cumsum (rnorm (100, 6, 3))), 1) mydf <- data.frame (Mark, highway, Position, MAF, PPC) 

I want to filter data that is less than 10 for PPC at a point in time greater than 0.3 for MAF.

  # filter PPC < 10 & MAF > 0.3 filtered <- mydf[mydf$PPC < 10 & mydf$MAF > 0.3,] 

I have a variable grouping - highway, and each Mark has a Position on the highway. For example, highway 1 for the first five signs:

  1.4 7.2 15.5 13.4 19.7 |-----|.......|.......|.....|.....| "SN1" "SN2" "SN3" "SN4" "SN5" 

Now I want to select any marks of ~ 30 so that they are well distributed on each route based on the position on each route (take into account the different length of the highway), and the minimum distance between the two choices is at least 10.

Edit: idea (rough sketch) enter image description here

I might think a little about how to resolve this issue. Help evaluate.

Editing: Here I can find out:

 # The maximum (length) of each highway is: out <- tapply(mydf$Position, mydf$highway, max) out 1 2 3 4 453.0 1012.4 846.4 597.6 min(out) [1] 453 #Total length of all highways totallength <- sum(out) # Thus average distance at which mark need to be placed: totallength / 30 [1] 96.98 

For highway 1, the theoretical signs may be:

  96.98, 96.98+ 96.98, 96.98+96.98+ 96.98, ........till it is less than maximum (length )for highway 1. 

Thus, theoretically, we need to select a mark every 96.98. But the marks on the highway may not coincide with

Note: the total mark selection result should not be exactly 30 (about 30)

+6
source share
1 answer

Since we don't need any other columns, the code is a bit simpler if we use split to get a list of positions.

 filtered$highway <- factor(filtered$highway) positions <- with(filtered, split(Position, highway)) 

A suitable number of marks on each highway can be found using the relative length of each track.

 highway_lengths <- sapply(positions, max) total_length <- sum(highway_lengths) n_marks_per_highway <- round(30 * highway_lengths / total_length) 

We can use the quantile function to get target points that are evenly distributed along each trace.

 target_mark_points <- mapply( function(pos, n) { quantile(pos, seq.int(0, 1, 1 / (n - 1))) }, positions, n_marks_per_highway ) 

For each target point, we find the nearest existing elevation mark on the highway.

 actual_mark_points <- mapply( function(pos, target) { sapply(target, function(tgt) { d <- abs(tgt - pos) pos[which.min(d)] }) }, positions, target_mark_points ) 

To make sure that it works, you can visualize the labels.

 is_mark_point <- mapply( function(pos, mark) { pos %in% mark }, positions, actual_mark_points ) filtered$is.mark.point <- unsplit(is_mark_point, filtered$highway) library(ggplot2) (p <- ggplot(filtered, aes(Position, highway, colour = is.mark.point)) + geom_point() ) 
+3
source

Source: https://habr.com/ru/post/919850/


All Articles