500,000 events per minute - 8333 events per second, which should be quite easy for a small cluster (3-5 machines) to process.
The problem will be that 720M daily documents will be open for 60 days (43B documents). If each of the 10 fields is 32 bytes, this is 13.8 TB of disk space (about 28 TB with a single replica).
For comparison, I have 5 nodes with a maximum capacity (64 GB of RAM, 31 gigabytes), with 1.2B documents consuming 1.2 TB of disk space (double the replica). This cluster could not handle the load with only 32 GB of RAM per machine, but now it is happy with 64 GB. This is 10 days of data for us.
Roughly, you expect to have a 40x number of documents consuming 10x of disk space than my cluster.
I donβt have the exact numbers in front of me, but our pilot project on using doc_values ββgives us something like a heap savings of 90%.
If all this math takes place, and doc_values ββis good, you can be fine with a similar cluster as far as the actual bytes are concerned. I would like more information about the costs of having so many separate documents.
We did some elasticsearch setup, but maybe more than could be done.
I would advise you to start with a few 64 gigabyte machines. You can add more as needed. Throw a few (smaller) client nodes as an interface for index and search queries.
source share