How to set the frequency distribution in R?

Is there a function that can be used to distribute frequencies in R? I know fitdistr , but as far as I can tell, it only works for data vectors (random samples). In addition, I know that the conversion between the two formats is trivial, but the frequencies are so high that the memory is troubling.

For example, fitdistr can be used as follows:

 x<-rpois(100, lambda=10) fitdistr(x,"poisson") 

Is there a function that will do the same on the frequency table? Something like lines:

 freqt <- as.data.frame(table(x)) fitfreqtable(freqt$x, weights=freqt$Freq, "poisson") 

Thanks!

+4
source share
2 answers

There is no built-in function that I know of for fitting a distribution to a frequency table. Note that a theoretically continuous distribution is inappropriate for the table, because the data is discrete. Of course, for sufficiently large N and sufficiently thin meshes this can be ignored.

You can create your own model function using optim or any other optimizer if you know the density that you are interested in. I did it here for gamma distribution (which was a bad assumption for this particular dataset, but no matter what).

The code is reproduced below.

 negll <- function(par, x, y) { shape <- par[1] rate <- par[2] mu <- dgamma(x, shape, rate) * sum(y) -2 * sum(dpois(y, mu, log=TRUE)) } optim(c(1, 1), negll, x=seq_along(g$count), y=g$count, method="L-BFGS-B", lower=c(.001, .001)) $par [1] 0.73034879 0.00698288 $value [1] 62983.18 $counts function gradient 32 32 $convergence [1] 0 $message [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH" 
+3
source

To set the Poisson distribution you only need the average value of your sample. Then the average is lambda, which is the only parameter of the Poisson distribution. Example:

 set.seed(1111) sample<-rpois(n=10000,l=10) mean(sample) [1] 10.0191 

which is almost equal to the lambda value set to create the sample (l = 10). The slight difference (0.0191) is due to the randomness of the random generator of the Poisson distribution. As n increases, the difference decreases. In addition, you can place the distribution using the optimization method:

 library(fitdistrplus) fitdist(sample,"pois") set.seed(1111) Fitting of the distribution ' pois ' by maximum likelihood Parameters: estimate Std. Error lambda 10.0191 0.03165296 

but this is just a waste of time. For theoretical information on setting the frequency data, you can see my answer here .

0
source

Source: https://habr.com/ru/post/1487748/


All Articles