How can I easily get the average, median, quartiles, etc., Given the amount of each value in R?

Suppose I have a data frame with a column for values โ€‹โ€‹and another column for the number of times this value has been observed:

x <- data.frame(value=c(1,2,3), count=c(4,2,1)) x # value count # 1 1 4 # 2 2 2 # 3 3 1 

I know that I can get the weighted data value using weighted.mean and the weighted median weighted.median function provided by several packages (for example, limma ), but how can I get other weighted statistics in my data, such as 1st and 3rd quartile, and perhaps standard deviation? โ€œExtendingโ€ data with rep not an option, because sum(x$count) is about 3 billion (the size of the human genome).

+4
source share
4 answers

Have you tried these packages:

  • Hmisc - It has several weighted statistics, including weighted quantiles.

  • laeken - it has weighted quantiles.

+7
source

Or try performing the inverse transform and starting the analysis in the usual way:

 dtf <- data.frame(value = 1:3, count = c(4, 2, 1)) x <- with(dtf, rep(value, count)) summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 1.000 1.000 1.571 2.000 3.000 fivenum(x) [1] 1 1 1 2 3 
+1
source

For completeness, I note that the S4Vectors package in Bioconductor provides an answer in the form of the "Rle" class, which allows you to build an encoding vector with a string length that supports all the usual operations:

 library(S4Vectors) x <- data.frame(value=c(1,2,3), count=c(4,2,1)) y <- Rle(x$value, x$count) mean(y) median(y) quantile(y) 
0
source

To complete Prasad Chalasani's answer , here is the code to complete the weighted median given column of values โ€‹โ€‹and another column for the number of times this value was found. Note that it uses the wtd.quantile function from the Hmisc package.

 require(Hmisc) x <- data.frame(value=c(1,2,3), count=c(4,2,1)) ## value count ## 1 1 4 ## 2 2 2 ## 3 3 1 wtd.quantile(x$value, x$count, probs = 0.5) ## 50% ## 1 
0
source

Source: https://habr.com/ru/post/1343708/


All Articles