More than memory data structures and how they are usually processed

Say I have a file-based data structure such as a B + tree. I understand that the data is expected to be stored on disk, but the index is usually loaded into memory. What if you have such a large file that even its index does not fit into memory? How is this usually handled? Secondly, since the index is a tree, not a linear data set, how is it usually laid out on disk?

I am mostly interested in how this is done in real projects (such as Berkeley DB). Obviously, I am interested in wide strokes. I hope to get an idea, so I have some context when I delve into the B-Tree section of my database book (or ran my memory from CS XYZ from years ago)

+3
source share
3 answers

B-trees are designed for page-based systems where this node fits into the page. To find an entry in the B-tree, you only need to load one page at a time, so you can do this.

Even updating them does not require a large number of pages that are simultaneously in memory - I think the most difficult operation is deleting when the nodes are reorganized, but if they are carefully implemented, then this can be done with relatively few pages in memory.

+2
source
+1

, , , "", , , , .

( , , XML ) , , .

And this will also answer your second question if you use paging, and the file structure is a sequence of pages of the same size, if you use streaming, than the data should be laid out in the order that you are going to use (in the case of a tree, probably , it will be either the DFS or BFS order, depending on your application).

+1
source

Source: https://habr.com/ru/post/1706648/


All Articles