Best SQL -: group vs .: select => 'DISTINCT'

Suppose the three models are standard compounds:

class Mailbox < ActiveRecord::Base
  has_many :addresses
  has_many :domains, :through => :addresses
end

class Address < ActiveRecord::Base
  belongs_to :mailbox
  belongs_to :domain
end

class Domain < ActiveRecord::Base
  has_many :addresses
  has_many :mailboxes, :through => :addresses
end

Now it’s obvious that if for any given mailbox you want to know in which domains it has addresses, you have two possible ways:

m = Mailbox.first
# either: SELECT DISTINCT domains.id, domains.name FROM "domains" INNER JOIN 
#         "addresses" ON "domains".id = "addresses".domain_id WHERE 
#         (("addresses".mailbox_id = 1))
m.domains.all(:select => 'DISTINCT domains.id, domains.name')
# or: SELECT domains.id, domains.name FROM "domains" INNER JOIN "addresses" ON
#     "domains".id = "addresses".domain_id WHERE (("addresses".mailbox_id = 1))
#      GROUP BY domains.id, domains.name
m.domains.all(:select => 'domains.id, domains.name', 
  :group => 'domains.id, domains.name')

The problem for me is that I do not know which solution is better. When I do not specify any other conditions, the PostgreSQL query planner approves solution number two (works as expected), but if I add conditions to the queries, it comes down to "Unique" and "Group":

With "DISTINCT":

 Unique  (cost=16.56..16.57 rows=1 width=150)
   ->  Sort  (cost=16.56..16.56 rows=1 width=150)
         Sort Key: domains.name, domains.id
         ->  Nested Loop  (cost=0.00..16.55 rows=1 width=150)
               ->  Index Scan using index_addresses_on_mailbox_id on addresses  (cost=0.00..8.27 rows=1 width=4)
                     Index Cond: (mailbox_id = 1)
               ->  Index Scan using domains_pkey on domains  (cost=0.00..8.27 rows=1 width=150)
                     Index Cond: (domains.id = addresses.domain_id)
                     Filter: (domains.active AND domains.selfmgmt)
(9 rows)

With "GROUP BY":

Group  (cost=16.56..16.57 rows=1 width=150)
   ->  Sort  (cost=16.56..16.56 rows=1 width=150)
         Sort Key: domains.name, domains.id
         ->  Nested Loop  (cost=0.00..16.55 rows=1 width=150)
               ->  Index Scan using index_addresses_on_mailbox_id on addresses  (cost=0.00..8.27 rows=1 width=4)
                     Index Cond: (mailbox_id = 1)
               ->  Index Scan using domains_pkey on domains  (cost=0.00..8.27 rows=1 width=150)
                     Index Cond: (domains.id = addresses.domain_id)
                     Filter: (domains.active AND domains.selfmgmt)
(9 rows)

I'm really not sure how to determine the best way to get this data. My instincts tell me to go with "GROUP BY", but I could not find the documentation suitable enough to solve this problem.

": group" ": select = > " DISTINCT "? , , , Oracle, DB2 MySQL ( , )?

+3
2

Postgresql < 8.4 (, , , ) - GROUP BY DISTINCT, .

8.4 , DISTINCT "", .

+9

SQL , GROUP BY "" DISTINCT: , .. , Postgre "" "".

GROUP BY - SELECT, "" "" - , , ( ). , COUNT (*), MAX (some_field) .., "Group" , , .. , ..

GROUP BY, SELECT, - , . , , .

+1

Source: https://habr.com/ru/post/1717007/


All Articles