Cassandra controls SSTable size

Is there a way to manage the maximum SSTable size, for example 100 MB, so when the CF has more than 100 MB of data, Cassandra creates the next SSTable?

+6
source share
1 answer

Unfortunately, the answer is not so simple, the size of your SSTables will depend on your compaction strategy, and there is no direct way to control the maximum sstable size.

SSTables are initially created when memtables are flushed to disk as SSTables. The size of these tables initially depends on your memory settings and the size of your heap ( memtable_total_space_in_mb , which is a big influence). Usually these SSTables are quite small. SSTables come together as part of a process called compaction .

If you use a multi-level compression strategy, you have the option of having very large SSTables. STCS will combine SSTables in minor compression if at least min_threshold (default 4) sstables of the same size, combining them into one file, ending with expiration and key merging. This makes it possible to create very large SSTables after a while.

Using the alignment strategy using the strategy, there is the sstable_size_in_mb option that controls the target size for SSTables. In general, SSTables will be less than or equal to this size if you do not have a partition key with a lot of data ("wide strings").

I have not experimented much with the compression strategy by date, but it is similar to STCS because it combines files of the same size, but saves the data together in a temporary order and has the configuration to stop the compression of the old data ( max_sstable_age_days ), which may be interesting.

The key is to find a compaction strategy that works best for your data and then adjusts properties around what works best for your model / data environment.

You can read more about configuration settings for compression here and read this guide to see if STCS or LCS is right for you.

+6
source

Source: https://habr.com/ru/post/984558/


All Articles