Is the Boost library a weighted median broken?

I admit that I am not an expert in C ++.

I am looking for a quick way to calculate the weighted median that apparently had Boost. But it looks like I can't get it to work.

#include <iostream> #include <boost/accumulators/accumulators.hpp> #include <boost/accumulators/statistics/stats.hpp> #include <boost/accumulators/statistics/median.hpp> #include <boost/accumulators/statistics/weighted_median.hpp> using namespace boost::accumulators; int main() { // Define an accumulator set accumulator_set<double, stats<tag::median > > acc1; accumulator_set<double, stats<tag::median >, float> acc2; // push in some data ... acc1(0.1); acc1(0.2); acc1(0.3); acc1(0.4); acc1(0.5); acc1(0.6); acc2(0.1, weight=0.); acc2(0.2, weight=0.); acc2(0.3, weight=0.); acc2(0.4, weight=1.); acc2(0.5, weight=1.); acc2(0.6, weight=1.); // Display the results ... std::cout << " Median: " << median(acc1) << std::endl; std::cout << "Weighted Median: " << median(acc2) << std::endl; return 0; } 

outputs the following result, which is clearly incorrect.

  Median: 0.3 Weighted Median: 0.3 

Am I doing something wrong? Any help would be greatly appreciated.

* however, the weighted amount works correctly *

@glowcoder: the weighted sum works just fine like this.

 #include <iostream> #include <boost/accumulators/accumulators.hpp> #include <boost/accumulators/statistics/stats.hpp> #include <boost/accumulators/statistics/sum.hpp> #include <boost/accumulators/statistics/weighted_sum.hpp> using namespace boost::accumulators; int main() { // Define an accumulator set accumulator_set<double, stats<tag::sum > > acc1; accumulator_set<double, stats<tag::sum >, float> acc2; // accumulator_set<double, stats<tag::median >, float> acc2; // push in some data ... acc1(0.1); acc1(0.2); acc1(0.3); acc1(0.4); acc1(0.5); acc1(0.6); acc2(0.1, weight=0.); acc2(0.2, weight=0.); acc2(0.3, weight=0.); acc2(0.4, weight=1.); acc2(0.5, weight=1.); acc2(0.6, weight=1.); // Display the results ... std::cout << " Median: " << sum(acc1) << std::endl; std::cout << "Weighted Median: " << sum(acc2) << std::endl; return 0; } 

and result

  Sum: 2.1 Weighted Sum: 1.5 
+4
source share
5 answers

The boost function is not interrupted.

The problem is that you are not providing enough data to evaluate P ^ 2. If you put a loop around data entry, for example

 for(int i=0;i<100000;i++){ acc2(0.1, weight=0.); acc2(0.2, weight=0.); acc2(0.3, weight=0.); acc2(0.4, weight=1.); acc2(0.5, weight=1.); acc2(0.6, weight=1.); } 

you will get the correct result

 Median: 0.3 Weighted Median: 0.5 

you can specify

  accumulator_set<double, stats<tag::weighted_median(with_p_square_cumulative_distribution) >, double> acc2 ( p_square_cumulative_distribution_num_cells = 5 ); 

which gives Weighted Median: 0.55 as an answer, even with only 6 points added, as in your question.

+5
source

What does a weighted median mean? The median considers only the order of the items, not the content. Weight doesn't change the order (it can change the average or amount, though). If you used counting occurrences (natural integers) instead of floats, you could expand the definition of the median, but I don't think what you're trying to do here.

+3
source

What about:

 accumulator_set<double, stats<tag::weighted_median(with_weighted_density) >, float> acc2; 
+2
source

It looks like you call the median twice. Perhaps you intended to call weighted_median a second time?

+1
source

Based on the documentation, he says that he uses the P ^ 2 score to calculate the median. I did a Google search and found Jain and Chlamtac "P ^ 2 algorithm for dynamically calculating quantiles and histograms without saving observations." To my surprise, it seems to me that the median Boost Accumulator is just an estimate, not an exact value. It should be called median, not median.

And it does seem that the measured median is broken; it does not take weight into account.

+1
source

Source: https://habr.com/ru/post/1341226/


All Articles