How bad is it that it has “extra” database queries?

I came from an external interface in web development, where we are trying very hard to limit the number of HTTP requests issued (by consolidating css, js files, images, etc.).

With a db (MySQL) connection, obviously you don't want to have unnecessary connections, but as a general rule, how bad is it to have a few small queries? (they run fast)

I ask because I move my application to a clustered environment and where before I cached some things in the server’s memory (since I was working on the same server), now I’m trying to make my application “stateless” in my current implementation, which means smaller db calls. This will help me with load balancing (avoiding sticky sessions) and also reduce server memory usage.

We are not talking tons of queries, perhaps 6-8 dB of calls instead of 2-4, returning from several records to several thousand records. Each one is fast, less than 30 ms (much less), but I don’t know if there is any “connection delay” that I have to worry about.

Thank you for understanding.

+5
source share
2 answers

Short answer: (1) make sure you stay at the same level with a high level O, reconnect the connections, measure performance; (2) think about how much you care about data consistency.

Long answer:

Performance

Strictly in terms of performance and, generally speaking, if you are no longer approaching maximizing database resources such as max connections, this is hardly likely to have a big impact. But there are certain things you should keep in mind:

  • run queries "6-8" that replace queries "2-4" at the same runtime? for example, if the current database interaction is in O(1) , will it change to O(n) ? Or will the current O(n) change to O(n^2) ? If so, you should consider what this means for your application.
  • most application servers can reuse existing database connections or have persistent database connection pools; make sure your application does not establish a new connection for each request; otherwise, it will make it even more ineffective.
  • in many common cases, mainly on large tables with complex indexes and joins, performing multiple queries using primary keys may be more efficient than combining these tables into a single query; this will occur if, when performing such joins, the server not only takes longer to execute a complex query, but also blocks other queries against the affected tables.

Generally speaking, about performance, a rule of thumb is always measured.

Coherence

However, performance is not the only aspect. Also think about how much you care about data consistency in your application.

For example, consider simple case tables A and B , which have a one-to-one relationship, and you request one record using the primary key. If you join these tables and get the result with a single query, you will either get a record from both A and B , or there are no records from them, which your application also expects. Now consider if you divide it into 2 queries (and you do not use transactions with preferred isolation levels) - you get a record from table A , but before you can capture the corresponding record from table B , it is deleted / updated by another process. Your application now has a record from A , but not from B

The general question here is, do you care about the ACID of your relational data, as it relates to the queries you break? If the answer is yes, you should consider how your application logic will react in these specific cases.

+5
source

6-8 requests for one web page? This is usually normal. I do this all the time.

Thousands of lines returned? Strangulation! What is the customer going to do with this? Can SQL do more processing and then return fewer rows?

With rare exceptions, only 1 connection per web page.

Each request has a lot of overhead. For example, INSERTing 100 rows to a table - 100 INSERT statements with one row will occupy about 10 times more than one 100-row INSERT . Therefore, when practical use reduces the number of calls to the server. This becomes very important if the network is global. The other side of the globe is 250ms apart, only for latency. A server in the same data center is probably so close that the delay can be ignored. On a global network, use stored procedures to minimize round trips.

I like the time when each request is actively used in the code. Then, if I see a performance problem, I want to see what request to work with first. Or use SlowLog.

+4
source

Source: https://habr.com/ru/post/1246280/


All Articles