Search for which bin

Question

Search for which bin

I am trying to find which category C belongs to double x . My categories are defined as line names and double the values in a file like this

A 1.0 B 2.5 C 7.0

which should be interpreted as follows

 "A": 0 < x <= 1.0 "B": a < x <= 2.5 "C": b < x <= 7.0

(the input can be of arbitrary length and can be sorted by their values). I just need such a function

 std::string findCategory(categories_t categories, double x) { ...insert magic here }

so for this example i would expect

 findCategory(categories, 0.5) == "A" findCategory(categories, 1.9) == "B" findCategory(categories, 6.0) == "C"

So my question is: a) how to write a function and b) what could be the best category_t choice (using stl in pre 11 C ++). I made several attempts, all of which were ... less successful.

+4

c ++ algorithm data-structures stl range

nxpnsv Jan 19 '12 at 20:49

source share

2 answers

One option is to use the std::map container with doubles as keys and values corresponding to what value is assigned to the range whose upper endpoint is the given value. For example, given your file, you will have a map like this:

 std::map<double, std::string> lookup; lookup[1.0] = "A"; lookup[2.5] = "B"; lookup[7.0] = "C";

Then you can use the std::map::lower_bound specified by some point to return a key / value pair whose key (upper end point) is the first key on the map that is at least as large as the one in question point, For example, with the map above, lookup.lower_bound(1.37) will return an iterator with the value "B.", lookup.lower_bound(2.56) will return an iterator whose value is "C." These searches are quick; they take O (log n) time to display with n elements.

In the above, I assume that the values you are looking for are non-negative. If negative values are acceptable, you can add a quick test to check if the value is negative before doing any checks. This way you can eliminate the side effects.

Why is it worth it, if you know something about the distribution of your queries (for example, they are evenly distributed), you can create a special data structure called the optimal binary search tree , which will give better access time than std::map . Also, depending on your application, there may be even faster options. For example, if you do this because you want to randomly select one of the results with different probabilities, I would suggest studying this article about the alias method , which allows you to generate random values over O (1).

Hope this helps!

+6

templatetypedef Jan 19 '12 at 20:55

source share

Johan lundberg · Accepted Answer · 2012-01-19T20:57:52+0000

You can use the pair type and 'lower_bound' from <algorithm> http://www.cplusplus.com/reference/algorithm/lower_bound/ .

Define your categories in terms of top-end: typedef pair categories_t;

Then just create a vector of these edges and do a search using binary search. See the full example below.

 #include <string> #include <vector> #include <algorithm> #include <iostream> using namespace std; typedef pair<double,string> category_t; std::string findCategory(const vector<category_t> &categories, double x) { vector<category_t>::const_iterator it=std::lower_bound(categories.begin(), categories.end(),category_t(x,"")); if(it==categories.end()){ return ""; } return it->second; } int main (){ vector< category_t > edges; edges.push_back(category_t(0,"bin n with upper edge at 0 (underflow)")); edges.push_back(category_t(1,"bin A with upper edge at 1")); edges.push_back(category_t(2.5,"bin B with upper edge at 2.5")); edges.push_back(category_t(7,"bin C with upper edge at 7")); edges.push_back(category_t(8,"bin D with upper edge at 8")); edges.push_back(category_t(9,"bin E with upper edge at 9")); edges.push_back(category_t(10,"bin F with upper edge at 10")); vector< double > examples ; examples.push_back(1); examples.push_back(3.3); examples.push_back(7.4); examples.push_back(-5); examples.push_back(15); for( vector< double >::const_iterator eit =examples.begin();eit!=examples.end();++eit) cout << "value "<< *eit << " : " << findCategory(edges,*eit) << endl; }

The comparison works the way we want, since double is the first in a pair, and pairs are compared first by comparing the first and then the second components. Otherwise, we would define a comparison predicate as described on the page above.

Search for which bin

More articles: