Permanent (disk-based) R-tree (or R * tree)

How can R * Tree be implemented as persistent (disk-based)? What is the architecture of the file for storing the index of the R * tree or for storing sheet values?

Notes: Also, what insert, update, and delete operations can be performed in such a constant R * Tree?

Notes II: I implemented a built-in R-Tree with bulk upload function. But I think that it doesn’t matter at all when we talk about disk.

+5
source share
3 answers

If you need an R-Tree index on disk, I would suggest using Spatialite or Postgis . Spatialite is a lightweight and easy to integrate into a standalone application. Alternatively, did you view the C # Spatial Index project ? . I wrote an implementation of R-Tree in Java a few years ago, and I do not recommend doing this if something already exists.

+4
source

File architecture

Well, these are pages (= blocks). Pages should have a multiple page size of the underlying storage, so probably 1kb or 8kb blocks. Each block has a number and can be a link in this way.

Catalog pages store bounding fields for children and their page numbers.

Child pages store actual data objects.

Tree management

Well, theoretically: whenever you change a page in memory, you write the changes to disk. What is it.

In practice, you may need to use the cache to improve performance, and you may have transactions to save your tree in the event of an application failure.

In both of these articles you can find a lot of literature in the field of RDBMS architecture.

The key advantage of R * -tree is that it is a regular page-oriented tree, as you will have them in database systems everywhere. If you have a good implementation on a B + -tree disk, you can reuse most of your code for R * -tree.

How to start

To get started, you need to get used to indexing data on disks, as is done in classic RDBMSs. I suggest starting with a B-tree or B + -tree drive. Allow deletion because you need to think about managing remote pages and all that.

Once you have identified the B-Tree on the disk (and perhaps spend some time optimizing it!), The execution of the R-tree on the disk should be fairly obvious.

I have not looked at the code, but this may be a good starting point: http://www.die-schoens.de/prg/ or some of the other related ones. Looking for a discrete implementation of the B + tree in C ++ or C

+8
source

If you already have an implementation of main memory, you can reuse the same code by simply adding writes to the disk. You must consider the page size and optimize the tree nodes to fit on the page (you can read it at a time).

It would be better (in terms of performance) to have snapshots of the main memory tree stored on disk (snapshots can be taken when the tree is not under high pressure), and write each change to disk.

In the question, you indicate that a tree query is of higher importance, so you should be better with R * -tree, as it minimizes overlap between tree nodes. However, if your requirements are focused on update operations (insert / delete), I would suggest taking a look at Maintain frequent updates in R-trees: a bottom-up approach .

+2
source

Source: https://habr.com/ru/post/981000/


All Articles