Defining a 2-parameter distribution for tabular data

Question

Defining a 2-parameter distribution for tabular data

I am trying to tune Weibull distribution to one tabular data. After the process of my point cloud, I get columns with the number of returns to each 1-meter piece of height. Example:

a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23) colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5)

In my example above, a height class centered at 13.5 meters has 7 points inside.

If I build matrix a, you can visualize the distribution of data:

 barplot(a)

enter image description here

Does anyone have a suggestion on how to put 2 digit data in this tabular data?

Thanks in advance!

+4

r

Gorgens May 04, '13 at 22:15

source share

2 answers

I am sure there is a way to make the change in the best possible way, but it might work;

 library('fitdistrplus') library('reshape2') a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23) colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5, 28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5) barplot(a) a2 = melt(a) a3= (rep(a2[,2],a2[,3])) fitdist(a3, "weibull") descdist(a3,boot=5000)

+2

aatrujillob May 04 '13 at 23:03

source share

Dason · Accepted Answer · 2013-05-04T23:47:57+0000

You can make the maximum probability for censored data.

 a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23) colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5, 28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5) centers <- as.numeric(colnames(a)) low <- centers - .5 up <- centers + .5 ll.weibullCensored <- function(par, dat){ shape <- par[1] scale <- par[2] # Get the probability for each 'bin' and take the log log.ps <- log(pweibull(up, shape, scale) - pweibull(low, shape, scale)) # Sum the logs of the bin probabilities as many times # as they should be as dictated by the data sum(rep(log.ps, dat)) } # Use optim or any other function to find a set # of parameters that maximizes the log likelihood o.optim <- optim(c(9, 28), ll.weibullCensored, dat = as.numeric(a), # this tells it to find max instead of a min control=list(fnscale=-1))

This gives essentially the same estimates as AndresT, but their approach was simply to assume that all the data fell in the center of the bins and fulfill the maximum probability for this imputed data set. It doesn't really matter, but with this method you don't need other packages.

Edit: The fact that the AndresT solution and mine give ratings that are very similar make a lot of sense if we look at what we maximize for each method. In the mine, we look at the probability of getting into each of the "bunkers". The AndreT solution uses the distribution density at the center of the hopper as a substitute for this probability. We can look at the ratio of the probability of getting into the bunker and the density value in the center of the bunker for each of the bins (using the shape and scale obtained from my solution), which gives:

 # Probability of each bin > ps [1] 0.0005495886 0.0009989085 0.0017438767 0.0029375471 0.0047912909 [6] 0.0075863200 0.0116800323 0.0174991532 0.0255061344 0.0361186335 [11] 0.0495572085 0.0656015797 0.0832660955 0.1004801353 0.1139855466 [16] 0.1197890284 0.1144657811 0.0971503491 0.0711370586 0.0433654456 [21] 0.0210758647 0.0077516837 0.0020274896 # Density evaluated at the center of the bin > ps.cent [1] 0.0005418957 0.0009868040 0.0017254545 0.0029103746 0.0047524364 [6] 0.0075325510 0.0116083397 0.0174078328 0.0253967142 0.0359988789 [11] 0.0494450583 0.0655288551 0.0832789134 0.1006305707 0.1143085230 [16] 0.1202647955 0.1149865305 0.0975322358 0.0712125315 0.0431169222 [21] 0.0206762531 0.0074246320 0.0018651941 # Ratio of the probability and the density > ps/ps.cent [1] 1.0141963 1.0122663 1.0106767 1.0093364 1.0081757 1.0071382 1.0061760 [8] 1.0052459 1.0043084 1.0033266 1.0022682 1.0011098 0.9998461 0.9985051 [15] 0.9971745 0.9960440 0.9954712 0.9960845 0.9989402 1.0057639 1.0193271 [22] 1.0440495 1.0870127

All these relationships are close to 1 - so both methods basically try to maximize the same value.

Defining a 2-parameter distribution for tabular data

More articles: