Like a cluster in clusters

I have a set of points on the map, each with a given parameter value. I would like to:

  • Group them in space and ignore any clusters with less than 10 points. My df should have a column (Clust) for the cluster, each point belongs to [DONE]
  • A subcluster of parameter values ​​in each cluster; add a column to my df (subClust) used to categorize each subcategory point.

I do not know how to make the second part, except, perhaps, with cycles.

The image shows a set of spatially distributed points (upper left) color encoded by the cluster and sorted by the parameter value in the upper right graph. The bottom row shows clusters s> 10 with dots (on the left) and faces for each cluster, sorted by the parameter value (on the right). These are these faces that I would like to be able to color code a subcategory according to the minimum distance of cluster separation (d = 1)

Any pointers / help are appreciated. My reproducible code is below.

enter image description here

# TESTING
library(tidyverse)
library(gridExtra)

# Create a random (X, Y, Value) dataset
set.seed(36)
x_ex <- round(rnorm(200,50,20))
y_ex <- round(runif(200,0,85))
values <- rexp(200, 0.2)
df_ex <- data.frame(ID=1:length(y_ex),x=x_ex,y=y_ex,Test_Param=values)

# Cluster data by (X,Y) location
d = 4
chc <- hclust(dist(df_ex[,2:3]), method="single")

# Distance with a d threshold - used d=40 at one time but that changes...
chc.d40 <- cutree(chc, h=d) 
# max(chc.d40)

# Join results 
xy_df <- data.frame(df_ex, Clust=chc.d40)

# Plot results
breaks = max(chc.d40)
xy_df_filt <- xy_df %>% dplyr::group_by(Clust) %>% dplyr::mutate(n=n()) %>% dplyr::filter(n>10)# %>% nrow

p1 <- ggplot() +
  geom_point(data=xy_df, aes(x=x, y=y, colour = Clust)) +
  scale_color_gradientn(colours = rainbow(breaks)) +
  xlim(0,100) + ylim(0,100) 

p2 <- xy_df %>% dplyr::arrange(Test_Param) %>%
ggplot() +
  geom_point(aes(x=1:length(Test_Param),y=Test_Param, colour = Test_Param)) +
  scale_colour_gradient(low="red", high="green")

p3 <- ggplot() +
  geom_point(data=xy_df_filt, aes(x=x, y=y, colour = Clust)) +
  scale_color_gradientn(colours = rainbow(breaks)) +
  xlim(0,100) + ylim(0,100) 

p4 <- xy_df_filt %>% dplyr::arrange(Test_Param) %>%
ggplot() +
  geom_point(aes(x=1:length(Test_Param),y=Test_Param, colour = Test_Param)) +
  scale_colour_gradient(low="red", high="green") +
  facet_wrap(~Clust, scales="free")

grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)

THIS SNIPPET DOES NOT WORK - cannot go through dplyr mutate () ...

# Second Hierarchical Clustering: Try to sub-cluster by Test_Param within the individual clusters I've already defined above
xy_df_filt %>% # This part does not work
  dplyr::group_by(Clust) %>% 
  dplyr::mutate(subClust = hclust(dist(.$Test_Param), method="single") %>% 
                  cutree(, h=1))

Below is the path to it with a loop, but I would rather learn this using dplyr or some other method without a loop. An updated image appears showing the edge clusters.

sub_df <- data.frame()
for (i in unique(xy_df_filt$Clust)) {
  temp_df <- xy_df_filt %>% dplyr::filter(Clust == i)
  # Cluster data by (X,Y) location
  a_d = 1
  a_chc <- hclust(dist(temp_df$Test_Param), method="single")

  # Distance with a d threshold - used d=40 at one time but that changes... 
  a_chc.d40 <- cutree(a_chc, h=a_d) 
  # max(chc.d40)

  # Join results to main df
  sub_df <- bind_rows(sub_df, data.frame(temp_df, subClust=a_chc.d40)) %>% dplyr::select(ID, subClust)
}
xy_df_filt_2 <- left_join(xy_df_filt,sub_df, by=c("ID"="ID"))

p4 <- xy_df_filt_2 %>% dplyr::arrange(Test_Param) %>%
ggplot() +
  geom_point(aes(x=1:length(Test_Param),y=Test_Param, colour = subClust)) +
  scale_colour_gradient(low="red", high="green") +
  facet_wrap(~Clust, scales="free")

grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)

enter image description here

+4
2

...

xy_df_filt_2 <- xy_df_filt %>% 
                group_by(Clust) %>% 
                mutate(subClust = tibble(Test_Param) %>% 
                                  dist() %>% 
                                  hclust(method="single") %>% 
                                  cutree(h=1))

. , , dist.   tibble , dist, , , .

​​ , group_by, xy_df df_ex.

+1

, do tidy, , do. split R map_dfr purrr. split Clust , . map_dfr .

xy_df_filt , , , , xy_df_filt_2, for. , .

xy_df_filt_2 <- xy_df_filt %>%
    split(.$Clust) %>%
    map_dfr(function(df) {
        subClust <- hclust(dist(df$Test_Param), method = "single") %>% cutree(., h = 1)

        bind_cols(df, subClust = subClust)
    })

ggplot(xy_df_filt_2, aes(x = x, y = y, color = as.factor(subClust), shape = as.factor(Clust))) +
    geom_point() +
    scale_color_brewer(palette = "Set2")

ggplot(xy_df_filt_2, aes(x = x, y = y, color = as.factor(subClust), shape = as.factor(Clust))) +
    geom_point() +
    scale_color_brewer(palette = "Set2") +
    facet_wrap(~ Clust)

2018-04-14 reprex (v0.2.0).

+1

Source: https://habr.com/ru/post/1696076/


All Articles