Query with multiple left joins - column value is incorrect

I have the following database structure, and I'm trying to run a single query that will show classrooms and how many students are in the class, how many rewards are given to the classroom, and how many points are allocated to one class (based on the classroom_id column).

Using the query at the very bottom, I am trying to collect the "totalPoints" that the class has assigned, based on counting the column of points in the classroom_redeemed_codes table and returning it as a whole.

For some reason, the values ​​are not valid for totalPoints - I am doing something wrong, but not sure if ...

- UPDATE - Here is sqlfiddle: - http://sqlfiddle.com/#!2/a9f45

My structure:

CREATE TABLE `organisation_classrooms` ( `classroom_id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(255) NOT NULL, `active` tinyint(1) NOT NULL, `organisation_id` int(11) NOT NULL, `period` int(1) DEFAULT '0', `classroom_bg` int(2) DEFAULT '3', `sortby` varchar(6) NOT NULL DEFAULT 'points', `sound` int(1) DEFAULT '0', PRIMARY KEY (`classroom_id`) ); CREATE TABLE organisation_classrooms_myusers ( `classroom_id` int(11) NOT NULL, `user_id` bigint(11) unsigned NOT NULL, ); CREATE TABLE `classroom_redeemed_codes` ( `redeemed_code_id` int(11) NOT NULL AUTO_INCREMENT, `myuser_id` bigint(11) unsigned NOT NULL DEFAULT '0', `ssuser_id` bigint(11) NOT NULL DEFAULT '0', `classroom_id` int(11) NOT NULL, `order_product_id` int(11) NOT NULL DEFAULT '0', `order_product_images_id` int(11) NOT NULL DEFAULT '0', `date_redeemed` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `points` int(11) NOT NULL, `type` int(1) NOT NULL DEFAULT '0', `notified` int(1) NOT NULL DEFAULT '0', `inactive` tinyint(3) NOT NULL, PRIMARY KEY (`redeemed_code_id`), ); SELECT t.classroom_id, title, COALESCE ( COUNT(DISTINCT r.redeemed_code_id), 0 ) AS totalRewards, COALESCE ( COUNT(DISTINCT ocm.user_id), 0 ) AS totalStudents, COALESCE (sum(r.points), 0) AS totalPoints FROM `organisation_classrooms` `t` LEFT OUTER JOIN classroom_redeemed_codes r ON ( r.classroom_id = t.classroom_id AND r.inactive = 0 AND ( r.date_redeemed >= 1393286400 OR r.date_redeemed = 0 ) ) LEFT OUTER JOIN organisation_classrooms_myusers ocm ON ( ocm.classroom_id = t.classroom_id ) WHERE t.organisation_id =37383 GROUP BY title ORDER BY t.classroom_id ASC LIMIT 10 

- EDIT -

OOPS! Sometimes I hate SQL code ... I made a big mistake, I'm trying to count the number of STUDENTS in classroom_redeemed_codes , and not in the table organization_classrooms_myuser. I'm sorry that I would have chosen this before ?!

 classroom_id | totalUniqueStudents 16 1 17 2 46 1 51 1 52 1 

There are 7 rows in the classroom_redeemed_codes table, but since classroom_id 46 has two rows, although with the same myuser_id (this is the student ID), this should appear as one unique student.

It makes sense? Essentially, an attempt to capture the number of unique students in the classroom_redeemed_codes tables based on the myuser_id column.

for example, class 46 of the class may have 100 rows in the classroom_redeemed_codes tables, but if it is the same myuser_id for each, it should indicate that totalUniqueStudents are counted as 1, not 100.

Let me know if this is unclear ....

- update - I have the following request, which seems to be borrowed from the user below, which seems to work ... (I have a headache), I will accept the answer again. Sorry for the confusion - I think I almost thought about it a little.

 select crc.classroom_id, COUNT(DISTINCT crc.myuser_id) AS users, COUNT( DISTINCT crc.redeemed_code_id ) AS classRewards, SUM( crc.points ) as classPoints, t.title from classroom_redeemed_codes crc JOIN organisation_classrooms t ON crc.classroom_id = t.classroom_id AND t.organisation_id = 37383 where crc.inactive = 0 AND ( crc.date_redeemed >= 1393286400 OR crc.date_redeemed = 0 ) group by crc.classroom_id 
+5
source share
3 answers

I ran, first completing the aggregate of the pre-tasking of your points to a specific class, and then used the left connection for it. I get more rows in the result set than the expected sample, but I do not have MySQL to verify / confirm directly. However, here is the SQLFiddle of your query . Fulfilling your query with the sum of points and having a Cartesian result when applying the user table, this is probably the basis of duplicate points. Pre-requesting the buyback codes themselves, you simply get this value, and then join the users.

 SELECT t.classroom_id, title, COALESCE ( r.classRewards, 0 ) AS totalRewards, COALESCE ( r.classPoints, 0) AS totalPoints, COALESCE ( r.uniqStudents, 0 ) as totalUniqRedeemStudents, COALESCE ( COUNT(DISTINCT ocm.user_id), 0 ) AS totalStudents FROM organisation_classrooms t LEFT JOIN ( select crc.classroom_id, COUNT( DISTINCT crc.redeemed_code_id ) AS classRewards, COUNT( DISTINCT crc.myuser_id ) as uniqStudents, SUM( crc.points ) as classPoints from classroom_redeemed_codes crc JOIN organisation_classrooms t ON crc.classroom_id = t.classroom_id AND t.organisation_id = 37383 where crc.inactive = 0 AND ( crc.date_redeemed >= 1393286400 OR crc.date_redeemed = 0 ) group by crc.classroom_id ) r ON t.classroom_id = r.classroom_id LEFT OUTER JOIN organisation_classrooms_myusers ocm ON t.classroom_id = ocm.classroom_id WHERE t.organisation_id = 37383 GROUP BY title ORDER BY t.classroom_id ASC LIMIT 10 
+7
source

You need a sum (r.points) and a subquery in the left outer join below

  SELECT t.classroom_id, title, COALESCE ( COUNT(DISTINCT r.redeemed_code_id), 0 ) AS totalRewards, COALESCE(sum(r.points),0) AS totalPoints ,COALESCE(sum(T1.cnt),0) as totalStudents FROM `organisation_classrooms` `t` left outer join (select classroom_id, count(user_id) cnt from organisation_classrooms_myusers group by classroom_id) T1 on (T1.classroom_id=t.classroom_id) LEFT OUTER JOIN classroom_redeemed_codes r ON ( r.classroom_id = t.classroom_id AND r.inactive = 0 AND ( r.date_redeemed >= 1393286400 OR r.date_redeemed = 0 ) ) WHERE t.organisation_id =37383 GROUP BY title ORDER BY t.classroom_id ASC LIMIT 10 
+5
source

I have simplified your request; there is no need to use COALLESCE with COUNT() , because COUNT() never returns NULL . For SUM() I prefer to use IFNULL() because it is shorter and more readable. The results below contain only data for classroom_id # 16, # 17 and # 46 for easier comparison with the example presented in the question. Actual result sets are larger and contain all the classroom_id present in the tables. However, their presence is not required to understand how and why it works.

 SELECT t.classroom_id, t.title, COUNT(DISTINCT r.redeemed_code_id) AS totalRewards, COUNT(DISTINCT ocm.user_id) AS totalStudents, IFNULL(SUM(r.points), 0) AS totalPoints FROM `organisation_classrooms` t LEFT JOIN `classroom_redeemed_codes` r ON r.classroom_id = t.classroom_id AND r.inactive = 0 AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0) LEFT JOIN `organisation_classrooms_myusers` ocm ON ocm.classroom_id = t.classroom_id WHERE t.organisation_id = 37383 GROUP BY t.classroom_id ORDER BY t.classroom_id ASC 

Try to break it into pieces and put it together. First, let's see which users are selected:

Request # 1

 SELECT t.classroom_id, t.title, ocm.user_id FROM `organisation_classrooms` t LEFT JOIN `organisation_classrooms_myusers` ocm ON ocm.classroom_id = t.classroom_id WHERE t.organisation_id = 37383 ORDER BY t.classroom_id ASC 

I deleted the classroom_redeemed_codes table and its fields, deleted GROUP BY and replaced the aggregated COUNT(ocm.user_id) function COUNT(ocm.user_id) with ocm.user_id to find out which users were selected.

The result shows that this part of the query is correct:

 classroom_id | title | user_id -------------+-------+-------- 16 | BLUE | 2 16 | BLUE | 1 17 | GREEN | 508835 17 | GREEN | 508826 46 | PINK | NULL 

In class number 16 there are 2 users, 2 more in # 7, and in class number 46 - no. Returning a GROUP BY will return the correct values ​​(2, 2, 0) in the totalStudents column.

Now check the link to the classroom_redeemed_codes table:

Request # 2

 SELECT t.classroom_id, t.title, r.redeemed_code_id, r.points FROM `organisation_classrooms` t LEFT JOIN `classroom_redeemed_codes` r ON r.classroom_id = t.classroom_id AND r.inactive = 0 AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0) WHERE t.organisation_id = 37383 ORDER BY t.classroom_id ASC 

Result:

 classroom_id | title | redeemed_code_id | points -------------+-------+------------------+------- 16 | BLUE | 7 | 50 17 | GREEN | 8 | 25 17 | GREEN | 9 | 75 46 | PINK | 5 | 250 46 | PINK | 6 | 100 

Again, classroom_id will produce (1, 2, 2) in the totalRewards column and (50, 100, 350) in the totalPoints column, which is correct.

The problem starts when you want to combine them into a single query. No matter which connection you use, for the provided input, you will get strings (2 * 1, 2 * 2, 1 * 2) for classroom_id , with values ​​16, 17 and 46 (in that order). The values ​​we multiply in parentheses are the number of rows for each classroom_id in the first and in the above query result.

Combined

Let's try a query that selects rows before grouping them:

 SELECT t.classroom_id, t.title, r.redeemed_code_id, ocm.user_id, r.points FROM `organisation_classrooms` t LEFT JOIN `classroom_redeemed_codes` r ON r.classroom_id = t.classroom_id AND r.inactive = 0 AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0) LEFT JOIN `organisation_classrooms_myusers` ocm ON ocm.classroom_id = t.classroom_id WHERE t.organisation_id = 37383 ORDER BY t.classroom_id ASC 

It returns this result set:

 classroom_id | title | redeemed_code_id | user_id | points -------------+-------+------------------+---------+------- 16 | BLUE | 7 | 2 | 50 16 | BLUE | 7 | 1 | 50 <- * -------------+-------+------------------+---------+------- 17 | GREEN | 8 | 508835 | 25 17 | GREEN | 8 | 508826 | 25 <- * 17 | GREEN | 9 | 508835 | 75 17 | GREEN | 9 | 508826 | 75 <- * -------------+-------+------------------+---------+------- 46 | PINK | 5 | NULL | 250 46 | PINK | 6 | NULL | 100 

I added horizontal rules to split rows that belong to the same group when we add the GROUP BY . This is basically a way to execute an SQL query with GROUP BY , regardless of the name of the actual software that implements it.

As you can see, for each class it combines all the redeemed codes associated with the class with all users associated with the class. If you add more users and redeem codes for classes # 16, # 17 and # 46 in your tables, you will get a much larger set of results.

The next step when executing a GROUP BY query is to create one row from each group specified above. There are no problems with the classroom_id and title columns; they contain one value in each group. For redeemed_code_id and user_id your query counts different values ​​and this works fine. The problem is adding points . If you just SUM() them, you will get a redemption code added for each user_id in the group. If you use SUM(DISTINCT points) , this is also not true, because it will ignore duplicates, even if they are different in the classroom_redeemed_codes table.

You want to add points for DISTINCT redeemed_code_id . I noted in the result above, set the lines you don't need.

This is not possible using this query, because when calculating aggregate values, each column is independent of the other. We need a query that selects the desired rows before grouping them.

Idea

We can try to add the missing columns (with NULL values) to the two simple queries above, UNION ALL then select them from this and GROUP BY .

First make sure that it selects what we need:

 SELECT t.classroom_id, t.title, NULL AS redeemed_code_id, ocm.user_id, NULL AS points FROM `organisation_classrooms` t LEFT JOIN `organisation_classrooms_myusers` ocm ON ocm.classroom_id = t.classroom_id WHERE t.organisation_id = 37383 UNION ALL SELECT t.classroom_id, t.title, r.redeemed_code_id, NULL AS user_id, r.points FROM `organisation_classrooms` t LEFT JOIN `classroom_redeemed_codes` r ON r.classroom_id = t.classroom_id AND r.inactive = 0 AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0) WHERE t.organisation_id = 37383 ORDER BY classroom_id 

Attention! The ORDER BY applies to the UNION result set. If you want to order rows from each SELECT (this does not help, because UNION does not support order), you must enclose this query in parentheses and place ORDER BY clauses there.

The result looks great:

 classroom_id | title | redeemed_code_id | user_id | points -------------+-------+------------------+---------+------- 16 | BLUE | NULL | 1 | NULL 16 | BLUE | NULL | 2 | NULL 16 | BLUE | 7 | NULL | 50 -------------+-------+------------------+---------+------- 17 | GREEN | 8 | NULL | 25 17 | GREEN | 9 | NULL | 75 17 | GREEN | NULL | 508826 | NULL 17 | GREEN | NULL | 508835 | NULL -------------+-------+------------------+---------+------- 46 | PINK | 5 | NULL | 250 46 | PINK | 6 | NULL | 100 46 | PINK | NULL | NULL | NULL 

Now we can put some bracket around the query above (strip ORDER BY ) and use it in another query, grouping the data by classroom_id , counting users and bought codes and summing their points.

You will get a query that looks awful, and in your current database schema, it scans when your tables have several hundred rows. That is why I will not write here .

Attention! Its performance can be improved by adding missing indexes to your tables in the fields that appear in the ON , WHERE , ORDER BY and GROUP BY clauses of the query.

This will lead to significant improvement, but I will not rely heavily on it. For really large tables (hundreds of thousands of rows), it scans anyway.

Another idea

We can also add GROUP BY on both Query # 1 and Query # 2 and UNION ALL after this:

 SELECT t.classroom_id, t.title, NULL AS totalRewards, COUNT(DISTINCT ocm.user_id) AS totalStudents, NULL AS totalPoints FROM `organisation_classrooms` t LEFT JOIN `organisation_classrooms_myusers` ocm ON ocm.classroom_id = t.classroom_id WHERE t.organisation_id = 37383 GROUP BY t.classroom_id UNION ALL SELECT t.classroom_id, t.title, COUNT(DISTINCT redeemed_code_id) AS totalRewards, NULL AS totalStudents, SUM(points) AS totalPoints FROM `organisation_classrooms` t LEFT JOIN `classroom_redeemed_codes` r ON r.classroom_id = t.classroom_id AND r.inactive = 0 AND (r.date_redeemed >= 1393286400 OR r.date_redeemed = 0) WHERE t.organisation_id = 37383 GROUP BY t.classroom_id ORDER BY classroom_id, totalRewards 

This creates a good set of results:

 classroom_id | title | totalRewards | totalStudents | totalPoints -------------+-------+--------------+---------------+------------- 16 | BLUE | NULL | 2 | NULL 16 | BLUE | 1 | NULL | 50 17 | GREEN | NULL | 2 | NULL 17 | GREEN | 2 | NULL | 100 46 | PINK | NULL | 0 | NULL 46 | PINK | 2 | NULL | 350 

This query can be embedded in another query that groups by classroom_id and SUM() common columns above to get the final result. But then again, the final query is big and ugly, and it doesn't work very fast for large tables. And again, that’s why I don’t write it here .

Conclusion

This can be done in one query, but it does not look good, and it does not work well on large tables.

Regarding performance, put EXPLAIN in front of your query, then check the values ​​in the type , key and Extra columns of the result. See the Documentation for an explanation of the possible meanings of these columns, what to try to achieve and what should not be.

Both queries, created by me on both ideas, produce associations of the range or ALL and having Using filesort in the Extra column (they are all slow). In contrast, using them as subqueries in large queries will not improve the way they are executed.

I recommend that you run individual SELECT queries from the last code example as two separate queries ; they return odd and even rows from the above result set. Then combine the results into client code . It will work faster this way.

+4
source

Source: https://habr.com/ru/post/1208810/


All Articles