Sharing a database on a web server

Question

Sharing a database on a web server

Today I found an article on the Internet that discussed the architecture of Facebooks (although it is a bit outdated). While reading, I noticed that in the Software section that helps scale Facebook , a third point indicates:

Facebook uses MySQL, but primarily as a permanent repository with key values, moving connections and logic to web servers, since optimization (on the "other side" of the Memcached layer).

Why transfer complex connections to a web server? Are databases optimized to perform connection logic? This methodology seems to contradict what I have learned up to this point, so perhaps the explanation just eludes me.

If possible, can someone explain this (an example will help a lot) or tell me a good article (or two) about the advantages (and, possibly, examples) of how and why you would like to do this?

+4

performance database join theory webserver

Jessebuesking Dec 02 '11 at 21:48

source share

1 answer

David · Accepted Answer · 2011-12-02T22:01:24+0000

I'm not sure about Facebook, but we have several applications where we follow a similar model. The foundation is pretty simple.

The database contains a huge amount of data. Making connections at the database level really slows down any queries we make on the data, even if we return only a small subset. (Let's say 100 rows of parent data and 1000 rows of child data, for example, in relation to parent-child)

However, using the .NET DataSet objects, we select the rows we need and then create the DataRelation objects in the DataSet, we see a sharp increase in performance.

I cannot answer why this is so, since I am not aware of the inner workings, but I can risk assuming ...

RDBMS (Sql Server in our case) deals with data that is in files. These files are very large, and only most of them can be loaded into memory even on our heavy SQL Server, so this may be a penalty for disk I / O.

When we load a small part of it into a data set, the union takes place completely in memory, so we lose the penalty for inputting I / O to disk.

Despite the fact that I can not fully explain the reason for the increase in productivity (and I would like someone more knowledgeable to tell me how right I am, I can tell you that in some cases when there is a VERY large amount of data, but your application only needs to pull out a small subset of it, there is a noticeable load in performance, following the model described.We saw how it turns applications that are simply scanned into fast applications.

But if done incorrectly, there is a penalty - if you overload the RAM of the computer, but do it improperly or in any situation, then you will also have problems with the failure or problems with performance.

Sharing a database on a web server

More articles: