Why do SQL databases use a write log to write to a command log?

I read about the voltdb command log file. The command log records transactional calls instead of each line change, as in a log with a record. By recording only the call, command logs are kept to a minimum, limiting the performance impact of disk I / O.

Can someone explain the database theory why Voltdb uses a command log and why standard SQL databases like Postgres, MySQL, SQLServer, Oracle use write-to-write logs?

+42
sql database logging transactions voltdb
Jan 06 '13 at
source share
5 answers

I think it's better to rephrase:

Why does the new distributed VoltDB use a command log to write forward logs?

Do an experiment and imagine that you are going to write your own repository / database implementation. Sure, you are advanced enough to abstract the file system and use block storage along with some additional optimizations.

Some basic terminology:

  • Status: information currently stored
  • Command: repository directive to change its state

So your database might look like this:

enter image description here

The next step is to execute some command:

enter image description here

Pay attention to several important aspects:

  • A command can affect many saved objects, so many blocks will become dirty.
  • The next state is a function of the current state and command

Some intermediate states may be skipped, because for this it is enough to have a chain of commands.

enter image description here

Finally, you need to guarantee data integrity.

  • Record on record forward . The central concept is that Status changes should be logged before any heavy update in persistent storage. Following our idea, we can register incremental changes for each block.
  • Command Log - The central concept is to only log Command , which is used to create state.

enter image description here

There are pros and cons for both approaches. The Write-Ahead log contains all the changed data; the command log requires additional processing, but is quick and easy.

VoltDB: logging and command recovery

The key to command logging is that it records calls, not consequences, of transactions. By recording only the call, command logs are kept to a minimum, limiting the impact of disk I / O on performance.

Additional notes

SQLite: write to write ahead

A traditional rollback log works by writing a copy of the original, immutable database content to a separate rollback log and then writing the changes directly to the database file.

COMMIT occurs when a special entry is added indicating a commit to the WAL. Thus, COMMIT can occur without writing to the original database, which allows readers to continue working with the original unchanged database, while changes are simultaneously transferred to the WAL.

PostgreSQL: write to write forward (WAL)

Using WAL results in a significant reduction in the number of records on disk, because only the log file must be flushed to disk to ensure that the transaction is completed, and not every data file is changed per transaction.

The log file is written sequentially, and therefore the cost of synchronizing the log is much less than the cost of cleaning the data page. This is especially true for servers serving many small transactions that relate to different parts of the data warehouse. Moreover, when the server processes many small concurrent transactions, a single fsync log file may be sufficient to complete many transactions.

Conclusion

Command logging:

  • faster
  • has a lower size
  • has a heavier "Repeat" procedure.
  • frequent snapshot required

Write Ahead Logging is a method of ensuring atomicity. Better command logging performance should also improve transaction processing. Databases per 1 foot

enter image description here

the confirmation

VoltDB Blog: An Introduction to VoltDB Magazine

One of the advantages of logging commands using the ARIES-style protocol is that a transaction can be logged before it can start executing, rather than executing a transaction and waiting for the log data to go to disk. Another Advantage is that the I / O bandwidth required for the command log is limited to the network used to transmit commands, and, in the case of Gig-E, this bandwidth can be met by cheap commodity disks.

It is important to remember that VoltDB is distributed by nature. Thus, transactions are a little difficult to manage, and the performance impact is noticeable.

VoltDB Blog: New VoltDBs Logging Feature

The command log in VoltDB consists of stored invocations and their parameters. A journal is created on each node, and each journal is replicated because all work is replicated to several nodes. This results in a replicated batch log that can be reset to zero during playback time. Because VoltDB transactions are strictly ordered, the Log command also contains order information. Thus, reproduction can occur in the order in which the original transactions were performed, with the complete transaction isolation of VoltDB. Since the calls themselves are often smaller than the changed data, and can be logged before they are committed, this approach has a very modest impact on presentation. This means that VoltDB users can stratosphere indicators, with an extra long warranty.

+65
Jan 10 '13 at 14:58
source share

From the description of Postgres' write forward http://www.postgresql.org/docs/9.1/static/wal-intro.html and the VoltDB command log (to which you referred), I do not see much difference at all. It seems to be an identical concept with a different name.

Both synchronize only the log file with the disk, but not the data so that the data can be restored by replaying the log file.

Section 10.4 of VoltDB explains that their community version does not have a command log, so it would not pass the ACID test. Even in the corporate edition, I don’t see the details of their transaction isolation (for example, http://www.postgresql.org/docs/9.1/static/transaction-iso.html ), so that it’s convenient for me that VoltDB is as serious as Postges.

+1
Jan 09 '13 at 3:57
source share

The way I read it is as follows: (My own opinion)

The command log, as described here, only records transactions as they occur, and not what happens to them or to them. So, here’s the magic thing ... If you want a rollback, you need to restore the last snapshot, and then you can play back all the transactions that were applied after that (described in the link above). This way you restore the backup and re-apply all your scripts, only VoltDB now automates it for you.

The real difference that I see in this is that you cannot roll back to a point in time logically, like in a normal transaction log. Conventional transaction logs (MSSQL, MySQL, etc.) can be easily rolled back to a point in time (in the correct setting), because transactions can be "canceled".

A question arises - referring to pos by pedz, does it always pass the ACID test even using Command Log? Will do some more reading ...

Add: I read more, and I don't think this is a good idea for very large and busy transactional databases. DB snapshots are automatically created when filling in the command logs to save you from the large transaction logs and IO used for this? You will receive large amounts of I / O when your snapshots are taken at regular intervals, and you also use your memory to the brink. Alos, in my opinion, you lose the ability to easily roll back to the point in time before the last automatic snapshot - think that it will be very difficult to manage.

I would rather stick to transaction logs for transaction systems. It is proven and works.

0
Jan 10 '13 at 9:47
source share

This is really a matter of detail. They record operations at the level of stored procedures, most RDBMS journals at the level of individual operators (and "below"). Also, their advertising about the benefits is a small red herring:

One of the advantages of logging commands using the ARIES-style protocol is that a transaction can be logged before it can start executing, rather than executing a transaction and waiting for the log data to go to disk.

They have to wait until the team is logged, its just a lot less.

If I am not mistaken, the VoltDB transaction unit is a stored procedure. A traditional DBMS should usually support special transactions containing any number of operators, so the question of registration at the procedure level is out of the question. In addition, stored procedures are often not really deterministic in a traditional DBMS (that is, if params + log + data always produces the same result) that they would have to work for this.

However, performance improvements would be significant for this limited RDBMS model.

0
May 18 '13 at 21:49
source share

With WAL, readers read from pages from unplanned magazines. No changes are made to the main database. When maintaining a command log, you cannot read from the command log.

Therefore, command logging is significantly different. VoltDB uses command logging to create recovery points and confidence in reliability, but it writes to the main db storage (RAM) in real time - with all the attendant blocking problems, etc.

0
Feb 03 '16 at 19:34
source share



All Articles