I have a set of cities that have many-to-many relationships with a set of tags. The user gives me a set of tags (which may contain duplicates!), And I need to return a list of relevant records, sorted by relevance.
Data
Here are some sample data to illustrate the problem:
Cities:
-------------------- | id | city | -------------------- | 1 | Atlanta | | 2 | Baltimore | | 3 | Cleveland | | 4 | Denver | | 5 | Eugene | --------------------
Tags:
------ | id | ------ | 1 | | 2 | | 3 | | 4 | ------
Cities are marked as follows:
Atlanta: 1, 2 Baltimore: 3 Cleveland: 1, 3, 4 Denver: 2, 3 Eugene: 1, 4
... so the CityTags table looks like this:
------------------------ | city_id | tag_id | ------------------------ | 1 | 1 | | 1 | 2 | | 2 | 3 | | 3 | 1 | | 3 | 3 | | 3 | 4 | | 4 | 2 | | 4 | 3 | | 5 | 1 | | 5 | 4 | ------------------------
Example 1
If the user gives me identifier tags: [1, 3, 3, 4], I want to calculate how many matches I have for each of the tags, and return the result, sorted by relevance, for example:
------------------------ | city | matches | ------------------------ | Cleveland | 4 | | Baltimore | 2 | | Eugene | 2 | | Atlanta | 1 | | Denver | 1 | ------------------------
Since Cleveland matched all four tags, it was first followed by Baltimore and Eugene, each of which had two tags, etc.
Example 2
Another example for a good grade. To search [2, 2, 2, 3, 4] we get:
------------------------ | city | matches | ------------------------ | Denver | 4 | | Atlanta | 3 | | Cleveland | 2 | | Baltimore | 1 | | Eugene | 1 | ------------------------
SQL
If I ignore duplicate tags, then this is trivial:
SELECT name,COUNT(name) AS relevance FROM (SELECT name FROM cities,citytags WHERE id=city_id AND tag_id IN (1,3,3,4)) AS matches GROUP BY name ORDER BY relevance DESC;
But that is not what I need. I need to respect duplicates. Can anyone suggest how I can do this?
Solution in Postgresql
Yeah! I need a temporary table. Postgresql allows me to do this using the WITH syntax. Here's the solution:
WITH search(tag) AS (VALUES (1), (3), (3), (4)) SELECT name, COUNT(name) AS relevance FROM cities INNER JOIN citytags ON cities.id=citytags.city_id INNER JOIN search ON citytags.tag_id=search.tag GROUP BY name ORDER BY relevance DESC;
Many thanks to those who answered.