Effective design for finding objects by several parameters with a range

Question

Effective design for finding objects by several parameters with a range

I have a set of objects of the same type in memory, and each of them has several immutable int properties (but not only them).

I need to find there an object (or several) whose properties are in a small range near the given values. For instance. a == 5+-1 && b == 21+-2 && c == 9 && any d .

What is the best way to store objects so that I can retrieve them efficiently?

I was thinking of creating a SortedList for each property and using BinarySearch , but I have many properties, so I would like to have a more general way, and not so much SortedLists .

It is important that the set itself is not immutable: I need the ability to add / remove elements.

Is there something like db memory for objects (not just data)?

+5

c # algorithm search indexing in-memory-database

Vlad Jan 31 '16 at 21:28

source share

2 answers

Firstly, the presence of a large number of SortedList is a good design. This, in essence, is how all modern RDBMSs solve the same problem.

In addition to this: if there were a simple, general, close to optimally efficient way of responding to such requests, RDBMSes would not bother with a relatively complex and slow hack to optimize the query plan, i.e. generating large numbers of candidate plans, and then heuristically evaluate which one will take a minimum of time.

Admittedly, queries with many joins between tables are what usually make the possible plan space huge in practice with RDBMSes, and you don't seem to have one here. But even with just one table (set of objects), if there are k fields that can be used to select rows (objects), then theoretically you can have k! different indices ( SortedList pairs (key, value), in which the key is an ordered sequence of values of the field k, and the value is, for example, a memory pointer to an object) to choose from. If the query result is a single object (or, alternatively, if the query contains a sentence without a range for all k fields), then the index used does not matter, but in each other case, each index will generally perform otherwise, the query planner must have accurate selectivity estimates each offer to choose the best index to use.

0

j_random_hacker Feb 01 '16 at 1:34

source share

Steven graves · Accepted Answer · 2016-02-01T20:13:18+0000

Just to expand on @j_random_hacker a bit: the usual approach to "selectivity estimates" is to build a histogram for the index. But you can already intuitively know what criteria will give the smallest initial result, coming from "a == 5 + -1 & b == 21 + -2 & c == 9". Most likely, it is "c == 9" if there are no exceptionally large number of duplicate values and small universes of potential values for "c".

So, a simple predicate analysis will be a simple starting point. Equality conditions are likely to be the most selective (exhibit the highest selectivity).

From this point, RDBMS will sequentially scan the records in the result set to filter out the remaining predicates. Perhaps this is your best approach.

Or there are any number of built-in DBMSs in SQL data that will do the hard work for you (eXtremeDB, SQLite, RDM, ... google is your friend) and / or have lower level interfaces that will not do all the work for you (nevertheless, most), but also will not impose SQL on you.

Effective design for finding objects by several parameters with a range

More articles: