SQL query to search across multiple tags with relevance sorting

I have a set of cities that have many-to-many relationships with a set of tags. The user gives me a set of tags (which may contain duplicates!), And I need to return a list of relevant records, sorted by relevance.

Data

Here are some sample data to illustrate the problem:

Cities:

-------------------- | id | city | -------------------- | 1 | Atlanta | | 2 | Baltimore | | 3 | Cleveland | | 4 | Denver | | 5 | Eugene | -------------------- 

Tags:

 ------ | id | ------ | 1 | | 2 | | 3 | | 4 | ------ 

Cities are marked as follows:

 Atlanta: 1, 2 Baltimore: 3 Cleveland: 1, 3, 4 Denver: 2, 3 Eugene: 1, 4 

... so the CityTags table looks like this:

 ------------------------ | city_id | tag_id | ------------------------ | 1 | 1 | | 1 | 2 | | 2 | 3 | | 3 | 1 | | 3 | 3 | | 3 | 4 | | 4 | 2 | | 4 | 3 | | 5 | 1 | | 5 | 4 | ------------------------ 

Example 1

If the user gives me identifier tags: [1, 3, 3, 4], I want to calculate how many matches I have for each of the tags, and return the result, sorted by relevance, for example:

 ------------------------ | city | matches | ------------------------ | Cleveland | 4 | | Baltimore | 2 | | Eugene | 2 | | Atlanta | 1 | | Denver | 1 | ------------------------ 

Since Cleveland matched all four tags, it was first followed by Baltimore and Eugene, each of which had two tags, etc.

Example 2

Another example for a good grade. To search [2, 2, 2, 3, 4] we get:

 ------------------------ | city | matches | ------------------------ | Denver | 4 | | Atlanta | 3 | | Cleveland | 2 | | Baltimore | 1 | | Eugene | 1 | ------------------------ 

SQL

If I ignore duplicate tags, then this is trivial:

 SELECT name,COUNT(name) AS relevance FROM (SELECT name FROM cities,citytags WHERE id=city_id AND tag_id IN (1,3,3,4)) AS matches GROUP BY name ORDER BY relevance DESC; 

But that is not what I need. I need to respect duplicates. Can anyone suggest how I can do this?

Solution in Postgresql

Yeah! I need a temporary table. Postgresql allows me to do this using the WITH syntax. Here's the solution:

 WITH search(tag) AS (VALUES (1), (3), (3), (4)) SELECT name, COUNT(name) AS relevance FROM cities INNER JOIN citytags ON cities.id=citytags.city_id INNER JOIN search ON citytags.tag_id=search.tag GROUP BY name ORDER BY relevance DESC; 

Many thanks to those who answered.

+6
source share
2 answers

If the user list is on a comma-separated list, you can try turning it into a temp table and joining it. I don't know the syntax for PosteGRE, so here is the idea in MySql:

 create temporary table usertags (tag_id int); insert usertags values (1),(3),(3),(4); SELECT name, COUNT(name) AS relevance FROM cities JOIN citytags on cities.id = citytags.city_id JOIN usertags on citytags.tag_id = usertags.tag_id GROUP BY name ORDER BY relevance DESC; 

Converting a comma-separated list to the above code will be as simple as replacing everyone with ),( using your server-side language, and then including it in the VALUES statement to populate the temp table.

Demo (MySql): http://www.sqlize.com/1qNThhD9tC

+3
source

Paste all the tags into the table and then the JOIN instead of including them in the IN list.

 CREATE TABLE #input ( tag_id INT NOT NULL ) ; INSERT INTO #input SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 3 UNION ALL SELECT 4 ; SELECT city.name, search.relevance FROM city INNER JOIN ( SELECT city_id, COUNT(*) AS relevance FROM citytags INNER JOIN #input ON #input.tag_id = citytags.tag_id GROUP BY city_id ) AS search ON search.city_id = city.id ORDER BY search.relevance DESC ; 
+1
source

Source: https://habr.com/ru/post/898981/


All Articles