Database Implementation Details - Row Header Overhead

The problem that prompts this question is related to the creation of ginormous inverted indices, similar to those used to create IR-systems. A common mantra from the IR community is that a relational database is not suitable for building IR systems. In any case, if you look at posgres, the overhead of a tuple is 23 bytes + filling (see "How much database disk space is required to store data from a regular text file?" In Postgres Frequently Asked Questions ). This is prohibitive (without scaling) for my work.

By the way, my data set is 17 lines of text, requiring 4-5 tables, depending on how the problem is cut. I remember trying a schema in sqlite, and the db file broke 100 gigs.

I am very interested to know what overhead for each row for Sql Server / MySql / Sqlite / Berkeley db (all its access methods) / Berkley Db sqlite3 interface / Kyoto, Tokyo db and Firebird. Anyone will not be able to answer the question, I think if someone was not as curious as I was to look at it.

change

  • Postgres - 23 (OMG!) Byte header + padding.
  • bdb-hash: 26-byte page overhead, 6-byte key / overhead data (Combined).
  • Bdb-btree: 26-byte page overhead, 10-byte key / overhead (Combined).
  • MySql Innodb: analyzed here (5-byte header + transaction id + turn indicator = 18 per line afaik) note-to-self: why does the transaction id appear on disk? What are roll pointers?
  • Sql server: from here . It captures the lengths of element variants; rows with static data types have very modest overheads. overhead estimates are highly dependent on the nature of the scheme and data. Overhead increases the larger element of the option.
+6
source share

Source: https://habr.com/ru/post/904909/


All Articles