Compute modes in multimodal distribution in R

I measured the height of the body of all my children. When I draw all the heights along the length axis, this result is as follows:

enter image description here

Each red (boys) or purple (girls) tick is one child. If two children have the same height (in millimeters), the tics add up. There are currently seven children with the same height. (Tick height and width do not make sense. They are scaled to be visible.)

As you can see, different heights are distributed unevenly along the axis, and the cluster is around certain values.

Graphs of histograms and data density look like this (with two density estimates, constructed as follows this answer ):

enter image description here

As you can see, this is a multimodal distribution.

How to calculate modes (in R)?


Here is the source data for you:

mm <- c(418, 527, 540, 553, 554, 558, 613, 630, 634, 636, 645, 648, 708, 714, 715, 725, 806, 807, 822, 823, 836, 837, 855, 903, 908, 910, 911, 913, 915, 923, 935, 945, 955, 957, 958, 1003, 1006, 1015, 1021, 1021, 1022, 1034, 1043, 1048, 1051, 1054, 1058, 1100, 1102, 1103, 1117, 1125, 1134, 1138, 1145, 1146, 1150, 1152, 1210, 1211, 1213, 1223, 1226, 1334) 
+5
source share
2 answers

I built something on my own using your mm data.

First, let's plot the density of mm to visualize the modes:

 plot(density(mm)) 

enter image description here

So, we see that in this distribution there are 2 modes. One is about 600 and one is about 1000. See how to find them.

To find the mode indices, I made this function:

 find_modes<- function(x) { modes <- NULL for ( i in 2:(length(x)-1) ){ if ( (x[i] > x[i-1]) & (x[i] > x[i+1]) ) { modes <- c(modes,i) } } if ( length(modes) == 0 ) { modes = 'This is a monotonic distribution' } return(modes) } 

Let's try it by our density:

 mymodes_indices <- find_modes(density(mm)$y) #you need to try it on the y axis 

Now mymodes_indices contains the indices of our ie modes:

 > density(mm)$y[mymodes_indices] #just to confirm that those are the correct [1] 0.0008946929 0.0017766183 > density(mm)$x[mymodes_indices] #the actual modes [1] 660.2941 1024.9067 

Hope this helps!

+4
source

I changed Jeffrey Evans answer to Peak for kernel density estimation to allow changing bw parameter and getting more or less peaks accordingly. This is necessary for other cases that will give many peaks with the accepted answer. The signifi parameter allows you to handle communications.

 library(dplyr) library(tidyr) get.modes2 <- function(x,bw,signifi) { den <- density(x, kernel=c("gaussian"),bw=bw) den.s <- smooth.spline(den$x, den$y, all.knots=TRUE, spar=0.1) s.1 <- predict(den.s, den.s$x, deriv=1) s.0 <- predict(den.s, den.s$x, deriv=0) den.sign <- sign(s.1$y) a<-c(1,1+which(diff(den.sign)!=0)) b<-rle(den.sign)$values df<-data.frame(a,b) df = df[which(df$b %in% -1),] modes<-s.1$x[df$a] density<-s.0$y[df$a] df2<-data.frame(modes,density) df2$sig<-signif(df2$density,signifi) df2<-df2[with(df2, order(-sig)), ] print(df2) df<-as.data.frame(df2 %>% mutate(m = min_rank(desc(sig)) ) %>% #, count = sum(n)) %>% group_by(m) %>% summarize(a = paste(format(round(modes,2),nsmall=2), collapse = ',')) %>% spread(m, a, sep = '')) colnames(df)<-paste0("m",1:length(colnames(df))) print(df) } mm <- c(418, 527, 540, 553, 554, 558, 613, 630, 634, 636, 645, 648, 708, 714, 715, 725, 806, 807, 822, 823, 836, 837, 855, 903, 908, 910, 911, 913, 915, 923, 935, 945, 955, 957, 958, 1003, 1006, 1015, 1021, 1021, 1022, 1034, 1043, 1048, 1051, 1054, 1058, 1100, 1102, 1103, 1117, 1125, 1134, 1138, 1145, 1146, 1150, 1152, 1210, 1211, 1213, 1223, 1226, 1334) mmdf<-data.frame(mm=mm) library(ggplot2) #0.25 defines the number of peaks. ggplot(mmdf,aes(mm)) + geom_density(adjust=0.25) modes<-get.modes2(mm,bw.nrd0(mm)*0.25,2) m1 m2 m3 m4 m5 m6 m7 m8 1 1032.08 921.91 1134.09 636.27,826.01 1217.74 548.54 715.84 420.00,1334.04 

enter image description here

+2
source

Source: https://habr.com/ru/post/1208787/


All Articles