I have a set of points on the map, each with a given parameter value. I would like to:
- Group them in space and ignore any clusters with less than 10 points. My df should have a column (Clust) for the cluster, each point belongs to [DONE]
- A subcluster of parameter values in each cluster; add a column to my df (subClust) used to categorize each subcategory point.
I do not know how to make the second part, except, perhaps, with cycles.
The image shows a set of spatially distributed points (upper left) color encoded by the cluster and sorted by the parameter value in the upper right graph. The bottom row shows clusters s> 10 with dots (on the left) and faces for each cluster, sorted by the parameter value (on the right). These are these faces that I would like to be able to color code a subcategory according to the minimum distance of cluster separation (d = 1)
Any pointers / help are appreciated. My reproducible code is below.

library(tidyverse)
library(gridExtra)
set.seed(36)
x_ex <- round(rnorm(200,50,20))
y_ex <- round(runif(200,0,85))
values <- rexp(200, 0.2)
df_ex <- data.frame(ID=1:length(y_ex),x=x_ex,y=y_ex,Test_Param=values)
d = 4
chc <- hclust(dist(df_ex[,2:3]), method="single")
chc.d40 <- cutree(chc, h=d)
xy_df <- data.frame(df_ex, Clust=chc.d40)
breaks = max(chc.d40)
xy_df_filt <- xy_df %>% dplyr::group_by(Clust) %>% dplyr::mutate(n=n()) %>% dplyr::filter(n>10)
p1 <- ggplot() +
geom_point(data=xy_df, aes(x=x, y=y, colour = Clust)) +
scale_color_gradientn(colours = rainbow(breaks)) +
xlim(0,100) + ylim(0,100)
p2 <- xy_df %>% dplyr::arrange(Test_Param) %>%
ggplot() +
geom_point(aes(x=1:length(Test_Param),y=Test_Param, colour = Test_Param)) +
scale_colour_gradient(low="red", high="green")
p3 <- ggplot() +
geom_point(data=xy_df_filt, aes(x=x, y=y, colour = Clust)) +
scale_color_gradientn(colours = rainbow(breaks)) +
xlim(0,100) + ylim(0,100)
p4 <- xy_df_filt %>% dplyr::arrange(Test_Param) %>%
ggplot() +
geom_point(aes(x=1:length(Test_Param),y=Test_Param, colour = Test_Param)) +
scale_colour_gradient(low="red", high="green") +
facet_wrap(~Clust, scales="free")
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)
THIS SNIPPET DOES NOT WORK - cannot go through dplyr mutate () ...
xy_df_filt %>%
dplyr::group_by(Clust) %>%
dplyr::mutate(subClust = hclust(dist(.$Test_Param), method="single") %>%
cutree(, h=1))
Below is the path to it with a loop, but I would rather learn this using dplyr or some other method without a loop. An updated image appears showing the edge clusters.
sub_df <- data.frame()
for (i in unique(xy_df_filt$Clust)) {
temp_df <- xy_df_filt %>% dplyr::filter(Clust == i)
a_d = 1
a_chc <- hclust(dist(temp_df$Test_Param), method="single")
a_chc.d40 <- cutree(a_chc, h=a_d)
sub_df <- bind_rows(sub_df, data.frame(temp_df, subClust=a_chc.d40)) %>% dplyr::select(ID, subClust)
}
xy_df_filt_2 <- left_join(xy_df_filt,sub_df, by=c("ID"="ID"))
p4 <- xy_df_filt_2 %>% dplyr::arrange(Test_Param) %>%
ggplot() +
geom_point(aes(x=1:length(Test_Param),y=Test_Param, colour = subClust)) +
scale_colour_gradient(low="red", high="green") +
facet_wrap(~Clust, scales="free")
grid.arrange(p1, p2, p3, p4, ncol=2, nrow=2)
