This is what happens. Consider the query in parts and what MySQL processes when it goes.
First you choose from the elements ( select i.item_id, i.item_name, avg(ir.rating) from Items i
):
+---------+-----------+ | item_id | item_name | +---------+-----------+ | 1 | Item 1 | | 2 | Item 2 | | 3 | Item 3 | | 4 | Item 4 | | 5 | Item 5 | +---------+-----------+
Then you join the ratings ( left join ItemRating ir ON ir.item_id = i.item_id
). Note that element 1 appears in two rows after joining, because the way JOIN is defined to work is that it returns one row for each matching join condition (and LEFT basically means "return every row in the first table at least once, even if there are no matching conditions on this line).
+---------+-----------+-----------+------------+ | item_id | item_name | ir.rating | ir.item_id | +---------+-----------+-----------+------------+ | 1 | Item 1 | 9 | 1 | | 1 | Item 1 | 6 | 1 | | 2 | Item 2 | NULL | NULL | | 3 | Item 3 | 10 | 3 | | 4 | Item 4 | NULL | NULL | | 5 | Item 5 | NULL | NULL | +---------+-----------+-----------+------------+
Finally, you are grouped by rating ( group by ir.item_id
). This will return one line for each unique ir.item_id file. There are three unique ir.item_ids (as you can see in the last column): 1
and NULL
and 3
. For each of them, it returns one row and averages the rating.
So, for 1
we have:
+---------+-----------+-----------+------------+ | item_id | item_name | ir.rating | ir.item_id | +---------+-----------+-----------+------------+ | 1 | Item 1 | 9 | 1 | | 1 | Item 1 | 6 | 1 | +---------+-----------+-----------+------------+
Collapse:
+---------+-----------+----------------+------------+ | item_id | item_name | avg(ir.rating) | ir.item_id | +---------+-----------+----------------+------------+ | 1 | Item 1 | 7.5 | 1 | +---------+-----------+----------------+------------+
For NULL
we have:
+---------+-----------+-----------+------------+ | item_id | item_name | ir.rating | ir.item_id | +---------+-----------+-----------+------------+ | 2 | Item 2 | NULL | NULL | | 4 | Item 4 | NULL | NULL | | 5 | Item 5 | NULL | NULL | +---------+-----------+-----------+------------+
Collapse:
+---------+-----------+----------------+------------+ | item_id | item_name | avg(ir.rating) | ir.item_id | +---------+-----------+----------------+------------+ | 2| Item 2 | NULL | NULL | +---------+-----------+----------------+------------+
For 3
we have:
+---------+-----------+-----------+------------+ | item_id | item_name | ir.rating | ir.item_id | +---------+-----------+-----------+------------+ | 3 | Item 3 | 10 | 3 | +---------+-----------+-----------+------------+
Collapse:
+---------+-----------+----------------+------------+ | item_id | item_name | avg(ir.rating) | ir.item_id | +---------+-----------+----------------+------------+ | 3 | Item 3 | 10 | 3 | +---------+-----------+----------------+------------+
The combination of three collapsed results gives:
+---------+-----------+----------------+------------+ | item_id | item_name | avg(ir.rating) | ir.item_id | +---------+-----------+----------------+------------+ | 1 | Item 1 | 7.5 | 1 | | 3 | Item 3 | 10 | 3 | | 2 | Item 2 | NULL | NULL | +---------+-----------+----------------+------------+
What did you get.
One tricky part is the way to collapse NULL strings. Recall that these were zero lines:
+---------+-----------+-----------+------------+ | item_id | item_name | ir.rating | ir.item_id | +---------+-----------+-----------+------------+ | 2 | Item 2 | NULL | NULL | | 4 | Item 4 | NULL | NULL | | 5 | Item 5 | NULL | NULL | +---------+-----------+-----------+------------+
When you execute a group, most database systems will not even let you select columns that are not part of the group. MySQL is an exception. Since you are only grouping on ir.rating, only one of them will allow you to choose, because there is no clear way to collapse three lines in a non-aggressive way. What MySQL does is simply select the first one it encounters and use the values ββin this row as a collapsed value. So, (2,4,5) => (2) and (Clause 2, Clause 4, Clause 5) => Clause 2 and (NULL, NULL, NULL) => NULL. This is why you only see line 2 (in fact, you see three folded lines that look like line 2).
To really see this in action and launch a point at home, consider this query:
select group_concat(i.item_id), group_concat(i.item_name), avg(ir.rating) from Items i left join ItemRating ir ON ir.item_id = i.item_id group by ir.item_id;
This is similar to your original query, except that all three selected columns now have group aggregate functions. I use GROUP_CONCAT
, which simply concatenates strings to form a minimized version (this would be true on other SQL systems besides MySQL). This returns the following:
+-------------------------+---------------------------+----------------+ | group_concat(i.item_id) | group_concat(i.item_name) | avg(ir.rating) | +-------------------------+---------------------------+----------------+ | 2,4,5 | Item 2,Item 4,Item 5 | NULL | | 1,1 | Item 1,Item 1 | 7.5000 | | 3 | Item 3 | 10.0000 | +-------------------------+---------------------------+----------------+