This is a data volume issue, not necessarily an architecture issue.
If on 1 machine or 1000 machines, if you end up summing 1,000,000 lines, you will have problems.
Instead of normalizing the data, you need to de-normalize it.
You note in a comment that your database is “perfect for your purpose,” when obviously this is not so. It is too slow.
So, something has to give. Your ideal model does not work, because you need to process too much data in too short a time. It looks like you need several datasets of a higher level than your original data. Perhaps a storage solution. Who knows, not enough information to really say.
But there are many things that you can do to satisfy a specific subset of requests with good response times, while maintaining special requests that respond "in 10-20 minutes."
Edit Comment:
I am not familiar with GridSQL or what it does.
If you send several identical SQL queries to separate "shard" databases, each of which contains a subset, then a simple selection query will scale on the network (that is, you will eventually be connected to a network with a controller), since - present, in parallel, without citizenship.
The problem becomes, as you mentioned, secondary processing, especially sorting and aggregates, since this can only be done in the final “raw” result set.
This means that your controller will inevitably become your bottleneck and, in the end, no matter how you "scale", you still have to deal with the problem of data volume. If you send a query to 1000 nodes and inevitably have to sum or sort the result set of 1000 rows from each node, which leads to 1 mm rows, you still have a long result time and a big need for data processing on one machine.
I don’t know which database you are using, and I don’t know the specifics of individual databases, but you can see how if you really share your data on several disk spindles and have a decent, modern, multi-corrector, the database implementation itself can handle most of this scaling in terms of parallel disk spindle requests for you. What implementations actually do this, I cannot say. I just suggest that this be possible (and some of them can do it).
But, my general sense is, if you work, in particular, aggregates, then you are probably processing too much data if you get to the source sources every time. If you analyze your queries, you may be able to “pre-sum” your data at different levels of detail to avoid the problem of data saturation.
For example, if you keep separate web hits, but are more interested in activity based on each hour of the day (and not on the subseconds that you can register), a summary of only hours of the day can reduce your data significantly.
Thus, scaling can certainly help, but it may not be the only solution to the problem, rather it will be a component. The data warehouse is designed to solve these problems, but does not work with ad hoc queries. Rather, you need to have a reasonable idea of what queries you want to support and develop accordingly.