Temp Tables and SQL SELECT Performance

Why does using temporary tables with a SELECT statement improve the number of logical I / O operations? Wouldnโ€™t this increase the number of calls to the database, and would not decrease it. Is it because the โ€œproblemโ€ is partitioned? I would like to know what is going on behind the scenes.

+4
source share
4 answers

There is no general answer. It depends on how the temporary table is used.

The temp table can reduce IO by caching strings created after a complex filter / join, which are used several times later in the package. Thus, the database can avoid getting the base tables several times when only a subset of records is required.

The temp table can increase IO by storing entries that are never used later in the query, or by taking up a lot of space in the kernel cache that could be better used by other data.

Creating a temporary table to use all its contents is one time slower than including a temporary query in the main query, because the query optimizer cannot see behind the temporary table, and it forces (possibly) extra spool of data instead of allowing it to stream from the source tables .

+3
source

I'm going to assume from temporary tables that you mean a subselect in a WHERE clause. (This is called the semijoin operation, and you can usually see it in terms of executing text for your request.)

When the query optimizer encounters a sub-select / temp table, it makes some assumptions about what to do with this data. In essence, the optimizer will create an execution plan that performs the join in the subset of the selection results, reducing the number of rows that need to be read from other tables. Since there are fewer lines, the query engine can read fewer pages from disk / memory and reduce the number of I / O required.

+1
source

AFAIK, at least with mysql, tmp tables stored in RAM, making SELECT much faster than anything that gets on HD

0
source

There is a class of problems in which the construction of the result in the collection structure on the database side is much preferable to returning parts of the result to the client, rounding off for each part.

For example: arbitrary depth recursive relationships (boss)

There is another class of query problems where data is missing and will not be indexed so that the query is executed efficiently. Pulling the results into a collection structure that can be indexed in its own way will reduce the logical IO for these queries.

0
source

Source: https://habr.com/ru/post/1276751/


All Articles