MESSAGES vs. while while

Question

MESSAGES vs. while while

At the company where I came to work, they run the PHP / MySQL relational database. I always thought that if I needed to pull various information from different tables, I could just make a simple join to pull data, such as ....

SELECT table_1.id, table_2.id FROM table_1 LEFT JOIN table_2 ON table_1.sub_id = table_2.id

When I got to where I work now, this is what they do.

 <?php $query = mysql_query("SELECT sub_id FROM table_1"); while($rs = mysql_fetch_assoc($query)) { $query_2 = mysql_fetch_assoc(mysql_query("SELECT * FROM table_2 WHERE id = '{$rs['sub_id']}'")); //blah blah blah more queries ?>

When I asked why the second method did this, they said that it works faster than the connection. They manage a database that has millions of records in different tables, and some of them are a bit wide (in a row). They said that they want to avoid joining if a poorly executed query can lock the table (or several of them). Another thing to keep in mind is that this database comes with a massive report builder that the client can use to create their own report, and if they go crazy and create a large report, this can lead to chaos.

I was confused, so I thought I would throw this out to the public for general programming. This may be a matter of opinion, but is it really faster to execute the while statement (one larger request to pull out a lot of lines, followed by many small cross-requests, if you do) or make a connection (pull a larger request once to get all the necessary data). While indexes are executing correctly, does it matter? Another thing to keep in mind is that the current database is in InnoDB format.

Thanks!

Update 8/28/14

So, I thought that I would post an update for this, and that worked longer. After this discussion, I decided to rebuild the report generator here at work. I do not have final results numbers, but I thought I would share what the result was.

I think I overdid it a bit, because I turned the entire report (quite dynamic as far as the returned data) into a massive unification holiday. Most connections, if not all, join the primary key value, so they all work very fast. If the report allowed 30 columns of data to be said, and it pulled out 2,000 records, each individual field launched a query to retrieve data (since this part of the data could be in another field). 30 x 2000 = 60000 and even with a weak request time of 0.0003 seconds per request, it was another 18 seconds of the total request time (which is pretty much what I remember). Now, when I rebuilt the request as a massive connection to a bunch of primary keys (where possible), the same report loaded in about 2-3 seconds, and most of this time it loaded html. Each record that returns runs between 0-4 additional queries depending on the necessary data (may not need any data if it can receive it in connections, which happens in 75% of cases). Thus, the same 2000 records will return an additional 0-8000 queries (much better than 60,000).

I would say that the while statement is useful in some cases, but, as noted in the comments below, benchmarking is what this is all about. In my case, consolidation was the best option, but in other areas of my site, approval is more relevant. In one case, I have a report in which a client can request several categories in order to pull and return data only for these categories. It so happened that I had category_id IN(...,...,..,.., etc etc etc) with 50-500 identifiers, and the index suffocated and would die on my hands when I held it in the final . So, what I did, identifiers were distributed in groups of 10 and ran the same query x / 10 times, and my results were faster than before, because the index likes to deal with 10 identifiers, not 500, so I saw a significant improvement on my requests then due to the execution of the while statement.

+6

join php mysql while-loop

n0nag0n 12 sept '11 at 16:13

source share

5 answers

It is better to make a large query if the indexes are well placed.

Logics:

1 query = 1 call to the database server, which then processes the request (the optimizer and all) and, finally, returns the result. N queries mean N calls in the database, including N optimizer calls and, in the worst case, I / O.
MySQL has optimizations that work on JOINs. These optimizations may not work if you do some time.

As stated in previous answers, check with EXPLAIN if there is something that does not use the index, if you use JOIN. In addition, you should check the memory that is provided by the InnoDB cache and the memory provided by MySQL to parse this query. Maybe this is due to the parameters that the database is slower when executing JOINs.

+2

Lumbendil 12 sept '11 at 16:33

source share

I would say that there is an answer, it depends. Normally, I would say that the answer is the answer, and doing a few queries in a loop is bad practice, however it depends entirely on what is being done.

This is true? Without detailed table structures and index information, as well as the use of foreign keys, etc. We cannot say for sure. The best idea if you want to check, try and see. Get their requests, EXPLAIN them, write your own and do EXPLAIN, see which one is more efficient.

+1

Jon stirling 12 sept '11 at 16:20

source share

I'm not sure about the huge databases, but in my projects I always try to keep queries to a minimum. Requests use access to the hard drive and (if not the same host) access to the network, which is slow. If there are many entries in this first request, you can run thousands of requests per page, which will be slow.

+1

Alex turpin 12 sept '11 at 16:21

source share

Test to find out the actual answer.

In the given example, it is unlikely that (with equivalent data) a database connection will use more resources than creating a new connection and performing the same operation (after all: you still connect the data in the same way as a union, even if it is done from the outside ): if that were the case, then the engine could simply be rewritten to use this external route to improve performance.

When joins use more resources (besides problems with indexing), this is mainly due to deficiencies in getting data per row, which means that information from the parent table will be duplicated on each row, even if it is redundant.

This can cause performance problems that can be resolved by splitting requests if:

there are many children for one parent AND
you get a lot of data from the parent (many columns or large fields)

In my experience, reducing the number of queries almost always benefits performance (I optimized by combining queries much more than sorting them).

The proper use of indexes is, of course, good advice, but at first glance I don’t think that it will take into account the differences between the two scenarios, since the same indexes (or absence) will be applied in both cases.

+1

Inca 12 sept '11 at 17:39

source share

Chris baker · Accepted Answer · 2011-09-12T16:25:36+0000

If indexes are used correctly, then it is almost always more efficient to use JOINs. Emphasis is added because better performance is not always equal to better performance.

However, there is actually no answer for one size; you must parse the query with EXPLAIN to make sure that the indexes are actually used, that there is no unnecessary use of the temp table, etc. In some cases, the conditions are intended to create a query that simply cannot use indexes. In such cases, it may be faster to split requests into parts as you specified.

If I came across such code in an existing project, I would question it: check the request, think about different ways to execute the request, make sure that these things were considered, built a scientific, fact-based case or against practice. Make sure that the original developers have done their due diligence, as not using JOIN superficially indicates a poor database or query design. In the end, however, the results speak loudly, and if all the optimizations and corrections still lead to a slower connection than using query fragments, then a faster solution prevails. Benchmark and action based on the results of a benchmark test; There is no case in software development that you have to trade in poor performance to stick to arbitrary rules about what you should or should not do. The best way is the best method.

MESSAGES vs. while while

More articles: