Is premature optimization in SQL as "evil" as in programming languages?

I am learning SQL at the moment, and I read that joins and subqueries could potentially be performance destroyers. I (somewhat) know the theory of algorithmic complexity in procedure programming languages ​​and try to keep this in mind when programming, but I do not know how expensive different SQL queries can be. I decide whether I should invest time in learning about SQL performance or just notice it when my queries are slow. The basic question for me then: is premature optimization for SQL as evil as for procedural languages?

As additional information, I work in an environment where most of the time high performance is not a problem, and the largest tables that I have to work with have about 150 thousand rows.

Here is a quote from Donald Knuth that I mean when I say “evil”:

We should forget about small efficiencies, say, about 97% of the time: premature optimization is the root of all evils. But we must not miss our opportunity in this critical 3%.

+4
source share
6 answers

I would say that some general concepts about performance are mandatory: this will prevent you from writing really bad queries that could damage your application (even if you don't have millions of rows in your tables).

It will also help you design your database to be more work-oriented: you will have some ideas on where to place indexes, for example.

But you should not have performance as the first goal: firstly, you need to have an application that works; and, if necessary, you optimize it (the presence of some representations of productivity during development will help you to have an application that is easier to optimize).

Note. I would not say that “having an understanding of actions” is “premature optimization” if you are not just “optimizing”, but simply “writing correctly”; I would rather call it good practice to help you write better quality code; -)

+3
source

What Knuth Means: Indeed, it’s very important to know about SQL optimization, but only when you need it. As you say, "most of the time ... high performance is not a problem."

This is that 3% of the time when you need high performance, it is important to know which rules break and why.

However, unlike procedural languages, even for lines of 150 thousand, it is important to know a little about how your request is processed. For example, searching in free text will be very slow compared to searching through exact matches of indexed columns. It goes to the last steps in, for example, sharding or full denormalization, when most database administrators and developers draw a line.

+2
source

I would say that you should make SQL as simple as possible, and only worry about performance when it hits you.

That said.

Remember the standard things that you develop, such as indexes, subselects, using cursors, where a standard query will do the job, etc.

The original is not designed correctly, and you can optimize the problems later when necessary.

EDIT

Also remember that the convenience of maintaining your SQL code is very important, and that debugging SQL is a bit more complicated than regular coding.

+1
source

I would not say that SQL optimization has as many errors as premature programming optimization. Designing your schema and queries well in advance of performance can help you avoid some unpleasant redesigns later. This suggests that spending a day getting rid of a table scan can be completely useless for you in the long run, if this query is not a slow query, it can be cached or rarely called in a way that will affect your application.

I personally review my queries and focus on the worst and most commonly used queries. Careful design ahead of time cuts back on most of the worst.

+1
source

Knut says, “Forget about 97%,” but for a regular web application this is in the IO database, where 97% of the request time is spent. Here a little optimization can lead to great results.

If this is the kind of application you are writing, I highly recommend finding out how RDBMS work, as you can afford. Other people give you great recommendations, and I would add that I usually follow this list from top to bottom, deciding how to spend my "optimization budget":

  • Circuit diagram. Think twelve times about normalizing and accessing strategies. This will save you a lot of painful hours later.

  • Reading requests. Associated with C # 1, sometimes trying to redirect your queries give a better understanding of how the circuit should look. It will also help later when you ask for help.

  • Avoid subqueries in the SELECT list, use Connections.

  • If there are slow requests, Profiler. Check for missing indexes first And finally, if there are still slow queries, try rewriting it.

Remember also that database performance is very dependent on data distribution and the number of simultaneous queries (due to blocking). Even if the request completes after 1 sec. on your low-power netbook, it can take 15 seconds on an 8-core server. If possible, check your requests for evidence. If you know that concurrency will be high (paradoxically), it’s better to use a lot of small queries than one big one.

+1
source

I agree with everything that is said here, and I would like to add: make sure your SQL is well encapsulated, so when you find that you need to optimize, there is only one place you need to change, and the change will be transparent to any code that calls it.

Personally, I like to encapsulate all my SQL in PL / SQL procedures, but there are some that disagree with this. No matter what you do, I recommend that you do not try to put your SQL "inline" in different source code. It seems that this always leads to cutting and pasting and quickly becomes difficult to maintain. Put your SQL in another place and try to use it as much as possible.

Also, read indexes, how they really work, and when you should and shouldn't use them. For many people, the first instinct when they receive a slow query is to index the table to death. This may solve the problem in the short term, but a long-term table with a large index will be slow to insert and update. A few well-chosen indexes are much better than indexing each field. Try reading Refactoring SQL Applications by Stefan Farout.

Finally, as stated above, a properly normalized database design will help to avoid 99% of your slow queries. Sometimes denormalization occurs, but it is important that you know the rules before breaking them.

Good luck

0
source

Source: https://habr.com/ru/post/1301306/


All Articles