Are there specialized databases for aggregated queries?

Are there any specialized databases β€” rdbms, nosql, key-value, or something else β€” that are optimized to perform fast aggregate queries or maps β€” decrease, as it happens, on very large data sets:

select date, count(*) from Sales where [various combinations of filters] group by date 

So far I have run tests on MongoDB and SQL Server, but I wonder if there is a more specialized solution, preferably one that can scale the data horizontally.

+6
source share
5 answers

For some types of data (large volumes, time series) kx.com provides perhaps the best solution: kdb +. If this looks like your data, try it. Note: they do not use SQL, but rather a more general, more powerful and crazier set-theoretic language.

+1
source

In my experience, the real problem is less related to the performance of the aggregate query, which I find good in all the main databases that I tried, than this is due to the way the queries are written.

I lost counting the number of times I saw huge report requests with a huge number of joins and built-in subquery aggregates all over the place.

At the top of my head are typical steps to make these things faster:

  • Use window functions where available and applicable (i.e. the over () operator). There is absolutely no point in retransmitting data several times.

  • Use common table expressions ( with queries) where they are available and applicable (i.e. sets that you know will be small enough).

  • Use temporary tables for large subtotals and create indexes for them (and analyze them).

  • Working with small result sets by filtering rows earlier when possible: select id, aggregate from (aggregate on id) where id in (?) group by id can be done much faster by rewriting it as select id, aggregate from (aggregate on id where id in (?)) group by id .

  • Use union/except/intersect all , not union/except/intersect , where applicable. This removes the pointless sorting of result sets.

As a bonus, the first three steps, as a rule, make requests with reports more readable and, therefore, more convenient for maintenance.

+3
source

To a large extent, any OLAP database is exactly the type for which they are intended.

+2
source

OLAP data cubes are designed for this. You denormalize data into forms that they can quickly calculate. Denormalization and precomputation steps can be time consuming, so these databases are usually created only for reporting and separately from real-time transaction data.

+2
source

Oracle, DB2 Warehouse Edition and, to a lesser extent, SQLServer do very well with these aggregate queries - of course, these are expensive solutions and it depends on your budget and business situation, whether it is worth it.

+1
source

Source: https://habr.com/ru/post/887603/


All Articles