When are SELECT queries planned?

In PostgreSQL, when are (SELECT) queries scheduled?

It:

  • at the stage of preparing the instructions or
  • at the beginning of SELECT processing or
  • something else

The reason I ask is because there is a Stackoverflow question: the same request, two different ways, significantly different performance

Many people seem to think that the query is planned differently, because in one case the request contains a string literal ( 'foo' ), and in the other case it is a placeholder ( ? ).

Now I think that this is a red herring, because the request is not planned at the stage of preparing the instruction, but is actually planned in SELECT.

So, let's say, I could prepare the statement using a placeholder, and then execute the query several times with different binding values, and the query planner will be executed for each individual border value.

I suspect that the question related above comes down to the PostgreSQL data type of the value, which, in the case of the literal 'foo' , is known to be string, but in the case of a placeholder, the type cannot be allocated, so it approaches the query planner as some a strange type for whom he cannot create an effective plan. In this case, the problem is not that the query is planned differently, because the value is a placeholder (during preparation for the order) as such, but that the value goes to the query as another PostgreSQL type, but - This is what affects the query planner. To fix this, you just need to associate the placeholder with the corresponding explicit type declaration.

+6
source share
1 answer

I can’t talk about the Perl client interface, but I can shed light on the PostgreSQL server.

PostgreSQL prepared statements and unprepared statements. Untrained operators are analyzed, planned and executed immediately. They also do not support parameter overriding. In a simple psql shell, you can show your query plan as follows:

 tmpdb> explain select * from sometable where flag = true; 

On the other hand, there are prepared statements: they are usually (see "exception" below) analyzed and planned in one step and executed in the second stage. They can be repeated several times with different parameters, because they support parameter substitution. The equivalent in psql is this:

 tmpdb> prepare foo as select * from sometable where flag = $1; tmpdb> explain execute foo(true); 

You can see that the plan differs from the plan in an unprepared statement, because the planning really took place already in the prepare phase, as described in the document for PREPARE :

When the PREPARE statement is executed, the specified statement is parsed , rewritten, and scheduled . When the EXECUTE command is issued subsequently , the prepared statement must be executed. Thus, the steps of parsing, rewriting, and planning are performed only once, and not every time the statement is executed.

This also means that the plan is NOT optimized for substituted parameters: in the first examples, the index for flag can be used, since PostgreSQL knows that in ten records, only ten are true . This reasoning is not possible if PostgreSQL uses a prepared statement. In this case, a plan is created that will work for all possible parameter values ​​as best as possible. This may eliminate the indicated index, because getting the best part of the full table through random access (due to the index) is slower than a regular sequential scan. PREPARE doc confirms this:

In some situations, the query plan prepared for the prepared statement will be lower than the query plan that would be selected if the expression was sent and executed normally. This is due to the fact that when planning the application and the planner is trying to determine the optimal query plan, the actual values ​​of any parameters specified in the instructions are not available. PostgreSQL collects statistics on the distribution of data in a table and can use constant values ​​in an instruction to guess the likely outcome of an instruction. Since this data is not available when planning prepared statements with parameters, the selected plan may be suboptimal.

BTW - Regarding the caching of the PREPARE plan, the document also has something to say:

Prepared statements are saved only for the entire session of the current database. When the session ends, the finished statement is forgotten, so it must be recreated before reuse.

There is also no automatic caching of the plan and no caching / reuse of multiple connections.

EXCEPTION . I mentioned "usually." The psql examples shown do not apply to a client adapter such as Perl DBI. It uses a specific protocol . Here, the term “simple query” corresponds to “unprepared query” in psql , the term “ advanced query ” corresponds to “prepared query” with one exception: there is a difference between (one) “unnamed expression” and (possibly several) “named statements”. Regarding the above statements, doc says:

Named prepared statements can also be created and are available at the SQL command level using PREPARE and EXECUTE.

and:

Query scheduling for named statements of the finished statement occurs when a Parse message is processed.

So, in this case, planning is done without parameters, as described above for prepare - nothing new.

The specified exception is an “unnamed expression”. Doc says:

An unnamed prepared statement is also scheduled during Parse processing if no parameters are specified in the Parse message , but if there are parameters, query scheduling occurs every time Bind parameters are supplied. This allows the scheduler to use the actual parameter values ​​provided by each Bind message, rather than using general estimates.

And here is the advantage: although the unnamed statement is “prepared” (that is, it can have parameter substitution), it can also adapt the query plan to the actual parameters.

BTW: The exact processing of an unnamed statement has changed several times in previous releases of the PostgreSQL server. If you really want, you can find old documents.

Rationale - Perl / any client :

How a client like Perl uses a protocol is a completely different matter. Some clients, such as the JDBC driver for Java, basically say: "Even if the programmer uses a prepared statement, the first five (or so) executions are internally mapped to a" simple query "(that is, actually unprepared), after which the driver switches to" named operator. "

Thus, the client has the following options:

  • Force (re) schedule every time using the "simple request" protocol.
  • Plan once, execute several times using the extended query protocol and the "named operator" (the plan can be bad because planning is done without parameters).
  • Parse once, plan each execution (with the current version of PostgreSQL) using the advanced query protocol and the “unnamed statement” and obeying other things (provide some parameters during the “parse” message).
  • Reproducing completely different tricks, such as the JDBC driver.

What Perl is doing right now: I don't know. But the mentioned "red herring" is not very unlikely.

+10
source

Source: https://habr.com/ru/post/908425/


All Articles