COUNT (1) + COUNT (DISTINCT ()) is much slower than two queries separately

Request Explanation:

  • Person(indicated PersonID) may or may not have the corresponding Job(indicated JobID).
  • If there is one Job, the binding is saved in the table PersonJob( PersonID<=> JobID).
  • Personwithout being Jobignored.
  • Jobalso has CityID.
  • In each Job.CityIDrequest, he wants to know the total number Person, as well as the number of uniquePerson.HouseID

Query:

SELECT
  Job.CityID, COUNT(1) NumTotal, COUNT(DISTINCT(Person.HouseID)) NumDistinct
FROM
  Job
  INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
  INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
  Job.CityID

Statistics:

  • SELECT COUNT(1) FROM PersonJob ~ 600,000
  • SELECT COUNT(1) FROM Person ~ 800,000
  • SELECT COUNT(DISTINCT(Person.HouseID)) FROM Person ~ 10.000
  • SELECT COUNT(1) FROM Job ~ 500
  • MS SQL Server 10.50

Problem:

  • COUNT(1) part of the request at startup works separately in 0.25 seconds.

    SELECT
      Job.CityID, COUNT(1) NumTotal
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
    
  • COUNT(DISTINCT(Person.HouseID)) part of the request at startup is separately performed at 0.80 sec.

    SELECT
      Job.CityID, COUNT(DISTINCT(Person.HouseID)) NumDistinct
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
    
  • 3.10 - 3 , ?

:

  • , .
  • , COUNT (DISTINCT)
  • :
    • 25% Hash match () ( Job.CityID)
    • 15% Hash match (Inner Join) ( Job.CityID, Person.HouseID)
      • 30% ( Person.PersonID, Person.HouseID)
      • 14% ( PersonJob.PersonID)
  • :
    • 03% Hash match ( ) ( Job.CityID, COUNT(*))
    • 31% - () ( Job.CityID)
    • 29% ( Job.CityID, Person.HouseID)
+4
2

SQL Server 2012 .

rewrite .

WITH T1
     AS (SELECT Job.CityID,
                Person.HouseID
         FROM   Job
                INNER JOIN PersonJob
                        ON ( PersonJob.JobID = Job.JobID )
                INNER JOIN Person
                        ON ( Person.PersonID = PersonJob.PersonID )),
     PartialSums
     AS (SELECT COUNT(*) AS CountStarPartialCount,
                HouseID,
                CityID
         FROM   T1
         GROUP  BY CityID,
                   HouseID)
SELECT CityID,
       SUM(CountStarPartialCount) AS NumTotal,
       COUNT(HouseID)             AS NumDistinct
FROM   PartialSums
GROUP  BY CityID 

SQL Server 2012 . . ?

+4

, , , , DISTINCT. :

SELECT
  Job.CityID, NumTotal.Value, NumDistinct.Value
FROM
  Job
  LEFT JOIN
  (
    SELECT
      Job.CityID, COUNT(1) AS Value
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
  ) NumTotal ON (NumTotal.CityID = Job.CityID)
  LEFT JOIN
  (
    SELECT
      Job.CityID, COUNT(DISTINCT Person.HouseID) AS Value
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
  ) NumDistinct ON (NumDistinct.CityID = Job.CityID)
GROUP BY
  Job.CityID

0.70 , " " - 0.60 . , LEFT JOIN'inig 5 , " ", 20% , " ", .

+1

Source: https://habr.com/ru/post/1658195/