Must Read / Selected Documents in Database and Related Areas

Please recommend what you consider the most important or simply favorite documents in the field of database management, information systems, data mining, etc.

Here are a couple that I consider important milestones:

+4
source share
2 answers

Since I'm new to stackoverflow, I cannot post more than two hyperlinks! So the version with links is at http://www.reddit.com/r/compsci/comments/ghc0w/please_recommend_must_read_favorite_papers_in/c1no849

The book that zamanbakshi recommends, Transaction Processing: Concepts and Techniques, Gray and Reuter, is really, really good. I wore it so hard that the lid fell - and this is a hard cover. Of course, it is somewhat outdated on some topics, but it is much better read than most later books, such as Weikum and Vossen Transactional Information Systems, which is a good book, but, in my opinion, my eyes look back.

If I recall correctly, the text of Gray and Reuter does not cover the technique of restoring the history of Mohan, which is very important. See ARIES / NT: a log-based recovery method for nested transactions and ARIES: a transaction-recovery method that supports fine granularity locking and partial rollback using Write-Ahead write, at least but most papers Mohan is worth a read.

The book Concurrency Management and Recovery for Bernstein et al. Database Systems does not print, but you can download it from your Microsoft Research page.

There are also many good publications from David Lomet and the late (or absent) Jim Gray.

Some important articles that are not included in the second edition of the Red Book (the edition that I have):

  • Critique of isolation levels ANSI SQL (1995) Gray et al.
  • The Dangers of Replication and Solutions (1996) Gray and Helland
  • Generalized Determination of Insulation Level (2000) by Adya et al.

In a recent document that I think deserves more attention, is Serializable Isolation for Snapshot Databases (2009) by Cahill, Röhm and Fekete. This is a really simple technique that works surprisingly well. I hope that it will be implemented in some DBMSs. Looking for related things, I came across this interesting reading list. This is mainly due to flash memory, but there are some common papers of interest, including some recent work by Stonebraker.

I recommend skipping the Third Manifest date. I was very disappointed with this. I do not think that he has ever done object oriented programming. His early books and articles on relational DBMSs are good if they are repeated.

A good description of a main memory DBMS is the Dalí core storage manager architecture. An inconsistent, non-WAL checkpoint blew my mind first.

Here's a couple about mismatched distributed data management (for very, very big data):

  • BASE: Acid Alternative (2008) Pritchett
  • Living Beyond Distributed Transactions: Apostasy (2007) by Helland
+1
source
  • First of all, the most important collection of articles on DBMS theory that you should read is “Reads in 4E Database Systems” - Stonebraker (aka The Red Book)

    Each article in this book is a milestone; otherwise, this will not be done in this book. :-)

  • Stonebraker also has an excellent overview of the DBMS architecture, Database System Architecture: Fundamentals and Trends in Databases 1: 2 (2007)

  • BOOK for DBMS authors: "Transaction Processing: Concepts and Methods" - Jim Gray (his main work)

  • This can be seen as one large document: "The Relational Model for V2 Database Management" - Codd

  • Basics for Object Relational Databases, Third Manifest '--- CJ Date

  • Reads in Object Oriented Database Systems - Zdonik

  • Concurrency management and recovery in database systems - Bernstein

+7
source

Source: https://habr.com/ru/post/1344538/


All Articles