Selection and averages in a vector that is too close together

Question

Selection and averages in a vector that is too close together

I have an ordered vector such as:

c(2, 2.8, 2.9, 3.3, 3.5, 4.7, 5.5, 7.2, 7.3, 8.7, 8.7, 10)

I want to not only remove duplicates (which is easy with unique()), but also average values that are too close to each other based on the proximity threshold.

So, for the above example, if the difference between the two values, say <= 0.4, averages them. The vector should become:

c(2, 2.85, 3.4, 4.7, 5.5, 7.25, 8.7, 10)

The check should be performed in pairs of numbers, until more averaging is done.

EDIT: note that 2.9 and 3.3 should not be averaged, since 2.9 is already averaged from 2.8, and as soon as this is done, the distance to 3.3 is greater than 0.4. Thus, the cluster 2.8, 2.9, 3.3, 3.5ends 2.85, 3.4, not 3.125.

Is there an easy way to do this?

+4

r rounding

AF7 10 '17 8:06

1

liborm · Accepted Answer · 2017-05-10T08:24:19+0000

, ( ), . :

library(tidyverse)

data.frame(
  nums = c(2, 2.8, 2.9, 3.3, 3.5, 4.7, 5.5, 7.2, 7.3, 8.7, 8.7, 10)) %>%
  mutate(group = nums %>% dist %>% hclust %>% cutree(h=.4)) %>%
  group_by(group) %>%
  summarise(result = mean(nums)) %>%
  .$result

, , magrittr %>% . , dist - O(N^2).

Selection and averages in a vector that is too close together

More articles: