Since I'm new to stackoverflow, I cannot post more than two hyperlinks! So the version with links is at http://www.reddit.com/r/compsci/comments/ghc0w/please_recommend_must_read_favorite_papers_in/c1no849
The book that zamanbakshi recommends, Transaction Processing: Concepts and Techniques, Gray and Reuter, is really, really good. I wore it so hard that the lid fell - and this is a hard cover. Of course, it is somewhat outdated on some topics, but it is much better read than most later books, such as Weikum and Vossen Transactional Information Systems, which is a good book, but, in my opinion, my eyes look back.
If I recall correctly, the text of Gray and Reuter does not cover the technique of restoring the history of Mohan, which is very important. See ARIES / NT: a log-based recovery method for nested transactions and ARIES: a transaction-recovery method that supports fine granularity locking and partial rollback using Write-Ahead write, at least but most papers Mohan is worth a read.
The book Concurrency Management and Recovery for Bernstein et al. Database Systems does not print, but you can download it from your Microsoft Research page.
There are also many good publications from David Lomet and the late (or absent) Jim Gray.
Some important articles that are not included in the second edition of the Red Book (the edition that I have):
- Critique of isolation levels ANSI SQL (1995) Gray et al.
- The Dangers of Replication and Solutions (1996) Gray and Helland
- Generalized Determination of Insulation Level (2000) by Adya et al.
In a recent document that I think deserves more attention, is Serializable Isolation for Snapshot Databases (2009) by Cahill, Röhm and Fekete. This is a really simple technique that works surprisingly well. I hope that it will be implemented in some DBMSs. Looking for related things, I came across this interesting reading list. This is mainly due to flash memory, but there are some common papers of interest, including some recent work by Stonebraker.
I recommend skipping the Third Manifest date. I was very disappointed with this. I do not think that he has ever done object oriented programming. His early books and articles on relational DBMSs are good if they are repeated.
A good description of a main memory DBMS is the Dalí core storage manager architecture. An inconsistent, non-WAL checkpoint blew my mind first.
Here's a couple about mismatched distributed data management (for very, very big data):
- BASE: Acid Alternative (2008) Pritchett
- Living Beyond Distributed Transactions: Apostasy (2007) by Helland