What actually happens when I compact the CouchDB database?

I noticed that every time I compress a CouchDB instance after inserting some material, the size drops quite a lot (sometimes even up to 20%).

I do not delete or modify any data, all I do is basically insert new records, compact, and the size is reduced.

What happens when I compact a database? Is it somehow data compression? Or is it because each new record comes with some kind of junk, which is later deleted compactly?

+4
source share
1 answer

CouchDB uses the file format for attachment only. The code never, never, executes fseek(3) . Any cut-off fragment of a .couch file that starts from the beginning is a valid database file. (CouchDB scans back from the end to find its "title").

The cost of this architecture records a lot of duplicate data every time you make changes. Basically, the couch writes your new data to the end of the file, and then writes all the metadata updates needed to include that data in the data tree, and writes a new header to fix it all forever.

Thus, you get a lot of duplicate metadata (internal nodes of the b-tree, etc.), not to mention the old document data, creating in the .couch file. Once again, this needs to be paid for a bulletproof technique without ever rewriting any data.

Compact scans only data from the old .couch file and writes only that to the new .couch file. B-trees are balanced, there are no more old documents. It is beautiful and clean.

+9
source

Source: https://habr.com/ru/post/1402212/


All Articles