Defining a 2-parameter distribution for tabular data

I am trying to tune Weibull distribution to one tabular data. After the process of my point cloud, I get columns with the number of returns to each 1-meter piece of height. Example:

a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23) colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5) 

In my example above, a height class centered at 13.5 meters has 7 points inside.

If I build matrix a, you can visualize the distribution of data:

 barplot(a) 

enter image description here

Does anyone have a suggestion on how to put 2 digit data in this tabular data?

Thanks in advance!

+4
source share
2 answers

You can make the maximum probability for censored data.

 a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23) colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5, 28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5) centers <- as.numeric(colnames(a)) low <- centers - .5 up <- centers + .5 ll.weibullCensored <- function(par, dat){ shape <- par[1] scale <- par[2] # Get the probability for each 'bin' and take the log log.ps <- log(pweibull(up, shape, scale) - pweibull(low, shape, scale)) # Sum the logs of the bin probabilities as many times # as they should be as dictated by the data sum(rep(log.ps, dat)) } # Use optim or any other function to find a set # of parameters that maximizes the log likelihood o.optim <- optim(c(9, 28), ll.weibullCensored, dat = as.numeric(a), # this tells it to find max instead of a min control=list(fnscale=-1)) 

This gives essentially the same estimates as AndresT, but their approach was simply to assume that all the data fell in the center of the bins and fulfill the maximum probability for this imputed data set. It doesn't really matter, but with this method you don't need other packages.

Edit: The fact that the AndresT solution and mine give ratings that are very similar make a lot of sense if we look at what we maximize for each method. In the mine, we look at the probability of getting into each of the "bunkers". The AndreT solution uses the distribution density at the center of the hopper as a substitute for this probability. We can look at the ratio of the probability of getting into the bunker and the density value in the center of the bunker for each of the bins (using the shape and scale obtained from my solution), which gives:

 # Probability of each bin > ps [1] 0.0005495886 0.0009989085 0.0017438767 0.0029375471 0.0047912909 [6] 0.0075863200 0.0116800323 0.0174991532 0.0255061344 0.0361186335 [11] 0.0495572085 0.0656015797 0.0832660955 0.1004801353 0.1139855466 [16] 0.1197890284 0.1144657811 0.0971503491 0.0711370586 0.0433654456 [21] 0.0210758647 0.0077516837 0.0020274896 # Density evaluated at the center of the bin > ps.cent [1] 0.0005418957 0.0009868040 0.0017254545 0.0029103746 0.0047524364 [6] 0.0075325510 0.0116083397 0.0174078328 0.0253967142 0.0359988789 [11] 0.0494450583 0.0655288551 0.0832789134 0.1006305707 0.1143085230 [16] 0.1202647955 0.1149865305 0.0975322358 0.0712125315 0.0431169222 [21] 0.0206762531 0.0074246320 0.0018651941 # Ratio of the probability and the density > ps/ps.cent [1] 1.0141963 1.0122663 1.0106767 1.0093364 1.0081757 1.0071382 1.0061760 [8] 1.0052459 1.0043084 1.0033266 1.0022682 1.0011098 0.9998461 0.9985051 [15] 0.9971745 0.9960440 0.9954712 0.9960845 0.9989402 1.0057639 1.0193271 [22] 1.0440495 1.0870127 

All these relationships are close to 1 - so both methods basically try to maximize the same value.

+6
source

I am sure there is a way to make the change in the best possible way, but it might work;

 library('fitdistrplus') library('reshape2') a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23) colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5, 28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5) barplot(a) a2 = melt(a) a3= (rep(a2[,2],a2[,3])) fitdist(a3, "weibull") descdist(a3,boot=5000) 
+2
source

Source: https://habr.com/ru/post/1479184/


All Articles