MongoDB scaling and memory usage with very large datasets

I am currently working on a system based on MongoDB, which will store at least a billion documents. This will increase by about 50 million every month.

The main collection identifier is YYYYMM_SOURCEID_DOCTYPE_UUID and serves as an index of fragments. Each entry results in approximately 1kb of index. 99% of transactions will occur in the last three months of data. We would like to support keyword searches for documents with very good performance over the last three months of data and at least semi-decent performance for old things.

Can MongoDB sound like a smart solution if I can keep the active end of the index in memory?

+4
source share
1 answer

I would advise you to change your fragment key, as with the current one, it seems that you can get into the last fragment of everything, since the key YYYYMM bit will make all new inserts always fall into the "rightmost" fragment. http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key#ChoosingaShardKey-Cardinality contains additional information about this.

Depending on the power of the “keywords” field, you can choose this as your fragment key. In this way, mongodb could easily retrieve all documents belonging to the keyword from a single shard. All entries will still be available for all shards, as they are broken down by keywords.

If the number of “keywords” is not very large (ie <100), then this is not a good fragment key, however you can combine it with the year and month, for example keywords_YYYYMM.

+2
source

Source: https://habr.com/ru/post/1399847/


All Articles