Renouncement
Fenwick trees are new to me, I just discovered them by finding this post. The results presented here are based on my understanding and some research, but I am by no means an expert on wood, I could have missed something.
Reference materials
An explanation of how fenwick tree works
fooobar.com/questions/217858 / ... reproduced from https://cs.stackexchange.com/a/10541/38148
https://cs.stackexchange.com/a/42816/38148
Using fenwick trees
https://en.wikipedia.org/wiki/Fenwick_tree
https://en.wikipedia.org/wiki/Prefix_sum
Task 1, finding the weight of a given record
Given the following table
CREATE TABLE `entries` ( `id` int(11) NOT NULL AUTO_INCREMENT, `weight` decimal(9,3) DEFAULT NULL, `fenwick` decimal(9,3) NOT NULL DEFAULT '0.000', PRIMARY KEY (`id`) ) ENGINE=INNODB;
and the data is already populated (see http://sqlfiddle.com/#!9/be1f2/1 provided by concat), how to read the weight for a given @entryid record?
The key concept to understand here is that the fenwick index structure is based on math and bitwise operations on the id values โโthemselves.
Queries should usually use only primary search queries ( WHERE ID = value ).
Any query using sorting ( ORDER BY ) or ranges ( WHERE (value1 < ID) AND (ID < value2)) skips the point and does not cross the tree in the given order.
For example, with key 60:
SET @entryid := 60;
Expand the value 60 in binary
mysql> SELECT (@entryid & 0x0080) as b8, -> (@entryid & 0x0040) as b7, -> (@entryid & 0x0020) as b6, -> (@entryid & 0x0010) as b5, -> (@entryid & 0x0008) as b4, -> (@entryid & 0x0004) as b3, -> (@entryid & 0x0002) as b2, -> (@entryid & 0x0001) as b1; +------+------+------+------+------+------+------+------+ | b8 | b7 | b6 | b5 | b4 | b3 | b2 | b1 | +------+------+------+------+------+------+------+------+ | 0 | 0 | 32 | 16 | 8 | 4 | 0 | 0 | +------+------+------+------+------+------+------+------+ 1 row in set (0.00 sec)
In other words, keeping only the set bits, we have
32 + 16 + 8 + 4 = 60
Now remove the low-order bits set one after the other to move through the tree:
32 + 16 + 8 + 4 = 60 32 + 16 + 8 = 56 32 + 16 = 48 32
This gives a path (32, 48, 56, 60) for access to element 60.
Note that converting 60 to (32, 48, 56, 60) only requires bit math for the ID value itself: there is no access to the table or database, and this calculation can be performed in the client issuing the request.
Icon fenwick element 60 then
mysql> select sum(fenwick) from entries where id in (32, 48, 56, 60); +--------------+ | sum(fenwick) | +--------------+ | 32.434 | +--------------+ 1 row in set (0.00 sec)
Check
mysql> select sum(weight) from entries where id <= @entryid; +-------------+ | sum(weight) | +-------------+ | 32.434 | +-------------+ 1 row in set (0.00 sec)
Now compare the effectiveness of these queries.
mysql> explain select sum(fenwick) from entries where id in (32, 48, 56, 60); +----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | 1 | SIMPLE | entries | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 4 | 100.00 | Using where | +----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
or presented otherwise
explain format=json select sum(fenwick) from entries where id in (32, 48, 56, 60); { "query_block": { "select_id": 1, "cost_info": { "query_cost": "5.61" }, "table": { "table_name": "entries", "access_type": "range", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "id" ], "key_length": "4", "rows_examined_per_scan": 4, "rows_produced_per_join": 4, "filtered": "100.00", "cost_info": { "read_cost": "4.81", "eval_cost": "0.80", "prefix_cost": "5.61", "data_read_per_join": "64" }, "used_columns": [ "id", "fenwick" ], "attached_condition": "(`test`.`entries`.`id` in (32,48,56,60))" } }
So, the optimizer extracted 4 rows using the primary key (there are 4 values โโin the IN clause).
If you do not use the fenwick index, we have
mysql> explain select sum(weight) from entries where id <= @entryid; +----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+ | 1 | SIMPLE | entries | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 60 | 100.00 | Using where | +----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
Or presented otherwise
explain format=json select sum(weight) from entries where id <= @entryid; { "query_block": { "select_id": 1, "cost_info": { "query_cost": "25.07" }, "table": { "table_name": "entries", "access_type": "range", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "id" ], "key_length": "4", "rows_examined_per_scan": 60, "rows_produced_per_join": 60, "filtered": "100.00", "cost_info": { "read_cost": "13.07", "eval_cost": "12.00", "prefix_cost": "25.07", "data_read_per_join": "960" }, "used_columns": [ "id", "weight" ], "attached_condition": "(`test`.`entries`.`id` <= (@`entryid`))" } }
Here, the optimizer scanned the index by reading 60 lines.
With ID = 60, the advantage of fenwick is 4 samples compared to 60.
Now consider how it scales with values โโup to 64K, for example.
With fenwick, a value of 16 bits will have no more than 16 bits, so the number of items to search will be no more than 16.
Without fenwick, a scan can read up to 64 thousand records (and will read an average of 32 KB).
Task 2, finding a record for a given weight
The OP's task was to find a record for a given weight.
for instance
SET @search_weight := 35.123;
To illustrate the algorithm, this post describes how search queries are performed (sorry if this is too verbose)
SET @found_id := 0;
First find the number of records.
SET @max_id := (select id from entries order by id desc limit 1);
In the test data, max_id is 156.
Since 128 <= max_id <256, the most significant bit to start the search is 128.
mysql> set @search_id := @found_id + 128; mysql> select id, fenwick, @search_weight, -> if (fenwick <= @search_weight, "keep", "discard") as action -> from entries where id = @search_id; +-----+---------+----------------+---------+ | id | fenwick | @search_weight | action | +-----+---------+----------------+---------+ | 128 | 66.540 | 35.123 | discard | +-----+---------+----------------+---------+
The weight of 66.540 is more than our search, so 128 is discarded, go to the next bit.
mysql> set @search_id := @found_id + 64; mysql> select id, fenwick, @search_weight, -> if (fenwick <= @search_weight, "keep", "discard") as action -> from entries where id = @search_id; +----+---------+----------------+--------+ | id | fenwick | @search_weight | action | +----+---------+----------------+--------+ | 64 | 33.950 | 35.123 | keep | +----+---------+----------------+--------+
Here we need to save this bit (64) and calculate the found weight:
set @found_id := @search_id, @search_weight := @search_weight - 33.950;
Then go to the following bits:
mysql> set @search_id := @found_id + 32; mysql> select id, fenwick, @search_weight, -> if (fenwick <= @search_weight, "keep", "discard") as action -> from entries where id = @search_id; +----+---------+----------------+---------+ | id | fenwick | @search_weight | action | +----+---------+----------------+---------+ | 96 | 16.260 | 1.173 | discard | +----+---------+----------------+---------+ mysql> set @search_id := @found_id + 16; mysql> select id, fenwick, @search_weight, -> if (fenwick <= @search_weight, "keep", "discard") as action -> from entries where id = @search_id; +----+---------+----------------+---------+ | id | fenwick | @search_weight | action | +----+---------+----------------+---------+ | 80 | 7.394 | 1.173 | discard | +----+---------+----------------+---------+ mysql> set @search_id := @found_id + 8; mysql> select id, fenwick, @search_weight, -> if (fenwick <= @search_weight, "keep", "discard") as action -> from entries where id = @search_id; +----+---------+----------------+---------+ | id | fenwick | @search_weight | action | +----+---------+----------------+---------+ | 72 | 3.995 | 1.173 | discard | +----+---------+----------------+---------+ mysql> set @search_id := @found_id + 4; mysql> select id, fenwick, @search_weight, -> if (fenwick <= @search_weight, "keep", "discard") as action -> from entries where id = @search_id; +----+---------+----------------+---------+ | id | fenwick | @search_weight | action | +----+---------+----------------+---------+ | 68 | 1.915 | 1.173 | discard | +----+---------+----------------+---------+ mysql> set @search_id := @found_id + 2; mysql> select id, fenwick, @search_weight, -> if (fenwick <= @search_weight, "keep", "discard") as action -> from entries where id = @search_id; +----+---------+----------------+--------+ | id | fenwick | @search_weight | action | +----+---------+----------------+--------+ | 66 | 1.146 | 1.173 | keep | +----+---------+----------------+--------+
We found another bit here
set @found_id := @search_id, @search_weight := @search_weight - 1.146; mysql> set @search_id := @found_id + 1; mysql> select id, fenwick, @search_weight, -> if (fenwick <= @search_weight, "keep", "discard") as action -> from entries where id = @search_id; +----+---------+----------------+--------+ | id | fenwick | @search_weight | action | +----+---------+----------------+--------+ | 67 | 0.010 | 0.027 | keep | +----+---------+----------------+--------+
And one more
set @found_id := @search_id, @search_weight := @search_weight - 0.010;
The end result of the search:
mysql> select @found_id, @search_weight; +-----------+----------------+ | @found_id | @search_weight | +-----------+----------------+ | 67 | 0.017 | +-----------+----------------+
Check
mysql> select sum(weight) from entries where id <= 67; +-------------+ | sum(weight) | +-------------+ | 35.106 | +-------------+ mysql> select sum(weight) from entries where id <= 68; +-------------+ | sum(weight) | +-------------+ | 35.865 | +-------------+
And indeed
35.106 (fenwick[67]) <= 35.123 (search) <= 35.865 (fenwick[68])
The search searches for values โโto resolve 1 bit at a time, and each search result decides the value of the next identifier to search.
The queries shown here are for illustration. In a real application, the code should just be a loop that contains:
SELECT fenwick from entries where id = ?;
with application code (or a stored procedure) that implements the logic associated with @found_id, @search_id and @search_weight.
General comments
- Fenwick trees are used to calculate the prefix. It makes sense to use these trees only if the problem for the solution includes prefixes in the first place. Wikipedia has several pointers to applications. The OP did not describe why, for example, the fenwick tree was used.
- Fenwick trees are a trade-off between search complexity and update complexity, so they only make sense for data, not static. For static data, calculating the full prefix will one day be more efficient.
- In our tests, we used the INNODB table, for which the primary key index is sorted, so calculating max_id is a simple O (1) operation. If max_id is already available elsewhere, I think that even using a MEMORY table with a HASH index for ID will work, since only primary searches are used.
PS
sqlfiddle does not work today, so publishing raw data (originally provided by concat) so that people polled can rerun tests.
INSERT INTO `entries` VALUES (1,0.480,0.480),(2,0.542,1.022),(3,0.269,0.269),(4,0.721,2.012),(5,0.798,0.798),(6,0.825,1.623),(7,0.731,0.731),(8,0.181,4.547),(9,0.711,0.711),(10,0.013,0.724),(11,0.930,0.930),(12,0.613,2.267),(13,0.276,0.276),(14,0.539,0.815),(15,0.867,0.867),(16,0.718,9.214),(17,0.991,0.991),(18,0.801,1.792),(19,0.033,0.033),(20,0.759,2.584),(21,0.698,0.698),(22,0.212,0.910),(23,0.965,0.965),(24,0.189,4.648),(25,0.049,0.049),(26,0.678,0.727),(27,0.245,0.245),(28,0.190,1.162),(29,0.214,0.214),(30,0.502,0.716),(31,0.868,0.868),(32,0.834,17.442),(33,0.566,0.566),(34,0.327,0.893),(35,0.939,0.939),(36,0.713,2.545),(37,0.747,0.747),(38,0.595,1.342),(39,0.733,0.733),(40,0.884,5.504),(41,0.218,0.218),(42,0.437,0.655),(43,0.532,0.532),(44,0.350,1.537),(45,0.154,0.154),(46,0.721,0.875),(47,0.140,0.140),(48,0.538,8.594),(49,0.271,0.271),(50,0.739,1.010),(51,0.884,0.884),(52,0.203,2.097),(53,0.361,0.361),(54,0.197,0.558),(55,0.903,0.903),(56,0.923,4.481),(57,0.906,0.906),(58,0.761,1.667),(59,0.089,0.089),(60,0.161,1.917),(61,0.537,0.537),(62,0.201,0.738),(63,0.397,0.397),(64,0.381,33.950),(65,0.715,0.715),(66,0.431,1.146),(67,0.010,0.010),(68,0.759,1.915),(69,0.763,0.763),(70,0.537,1.300),(71,0.399,0.399),(72,0.381,3.995),(73,0.709,0.709),(74,0.401,1.110),(75,0.880,0.880),(76,0.198,2.188),(77,0.348,0.348),(78,0.148,0.496),(79,0.693,0.693),(80,0.022,7.394),(81,0.031,0.031),(82,0.089,0.120),(83,0.353,0.353),(84,0.498,0.971),(85,0.428,0.428),(86,0.650,1.078),(87,0.963,0.963),(88,0.866,3.878),(89,0.442,0.442),(90,0.610,1.052),(91,0.725,0.725),(92,0.797,2.574),(93,0.808,0.808),(94,0.648,1.456),(95,0.817,0.817),(96,0.141,16.260),(97,0.256,0.256),(98,0.855,1.111),(99,0.508,0.508),(100,0.976,2.595),(101,0.353,0.353),(102,0.840,1.193),(103,0.139,0.139),(104,0.178,4.105),(105,0.469,0.469),(106,0.814,1.283),(107,0.664,0.664),(108,0.876,2.823),(109,0.390,0.390),(110,0.323,0.713),(111,0.442,0.442),(112,0.241,8.324),(113,0.881,0.881),(114,0.681,1.562),(115,0.760,0.760),(116,0.760,3.082),(117,0.518,0.518),(118,0.313,0.831),(119,0.008,0.008),(120,0.103,4.024),(121,0.488,0.488),(122,0.135,0.623),(123,0.207,0.207),(124,0.633,1.463),(125,0.542,0.542),(126,0.812,1.354),(127,0.433,0.433),(128,0.732,66.540),(129,0.358,0.358),(130,0.594,0.952),(131,0.897,0.897),(132,0.701,2.550),(133,0.815,0.815),(134,0.973,1.788),(135,0.419,0.419),(136,0.175,4.932),(137,0.620,0.620),(138,0.573,1.193),(139,0.004,0.004),(140,0.304,1.501),(141,0.508,0.508),(142,0.629,1.137),(143,0.618,0.618),(144,0.206,8.394),(145,0.175,0.175),(146,0.255,0.430),(147,0.750,0.750),(148,0.987,2.167),(149,0.683,0.683),(150,0.453,1.136),(151,0.219,0.219),(152,0.734,4.256),(153,0.016,0.016),(154,0.874,0.891),(155,0.325,0.325),(156,0.002,1.217);
PS 2
Now with full sqlfiddle:
http://sqlfiddle.com/#!9/d2c82/1