All this is due to the "logical order" of SQL statements. Although a DBMS can actually retrieve data in accordance with all sorts of smart strategies, it should behave in accordance with some predictable logic. Thus, we can assume that the various parts of the SQL query are processed "before" or "after" each other in terms of the behavior of this logic.
As it turned out, the ORDER BY is the last step in this logical sequence, so it cannot change the behavior of the “earlier” steps.
If you use GROUP BY , the rows were grouped by the time the SELECT executed, not to mention ORDER BY , so you can only view the columns that were grouped, or the "aggregate" values ​​calculated for all the values ​​in the group. (MySQL implements the controversial extension on GROUP BY where you can mention the column in SELECT , which logically cannot be there, and it will select one of an arbitrary row in this group).
If you use DISTINCT , it is logically processed after SELECT , but ORDER BY appears anyway later. This way, only after DISTINCT out the duplicates, the remaining results will be sorted in a certain order - but the discarded rows cannot be used to determine this order.
As for how to get the desired result, the key is to find the sorting value by which it will be valid after GROUP BY / DISTINCT been (logically) started. Remember that if you use GROUP BY , all aggregated values ​​are still valid - the aggregate function can view all the values ​​in the group. This includes MIN() and MAX() , which are ideal for ordering, since the “smallest number” ( MIN ) is the same as “the first number if I sort them in ascending order,” and vice versa for MAX
Thus, to order a set of different foo_number values ​​based on the lowest applicable bar_number for each, you can use this:
SELECT foo_number FROM some_table GROUP BY foo_number ORDER BY MIN(bar_number) ASC
Here is a live demo with some arbitrary data .
UPDATE: The comments discussed why, if an order is applied before grouping / deduplication, this order does not apply to groups. If that were the case, you still need a strategy for which a row is stored in each group: first or last.
As an analogy, imagine the original set of rows as a set of playing cards, selected from the deck and then sorted by their face value, from low to high. Now go through the sorted deck and put them in a separate pile for each suit. Which card should "represent" each deck?
If you are dealt cards face up, then the cards shown at the end will have the highest face value (“keep in the past” strategy); if you turn them face down and then turn each stack over, you will find the lowest face value (“stay first” strategy). Both are subordinate to the original order of cards, and the instruction "to hand over cards based on the suit" does not automatically tell the dealer (who represents the DBMS) which strategy was intended.
If the last stacks of cards are groups from GROUP BY , then MIN() and MAX() are the collection of each deck and the search for the minimum or maximum value, regardless of what order they are in. But since you can look inside groups you can also do other things, for example, sum the total cost of each deck ( SUM ) or the number of cards ( COUNT ), etc., making GROUP BY much more powerful than the “ordered DISTINCT ” can be .