How to efficiently perform connection intersection in SQL?

I have three tables books, tagsand taggings( books-xref-tags)

books
id | title |      author     
 1 | Blink | Malcolm Gladwell
 2 |  1984 |    George Orwell

taggings
book_id | tag_id
      1 |      1
      1 |      2
      2 |      1
      2 |      3

tags
id | name
 1 | interesting
 2 |  nonfiction
 3 |     fiction

I would like to find all books marked as “interesting” and “fictitious”. The best I've come up with is

select books.* from books, taggings, tags
 where taggings.book_id = books.id
   and taggings.tag_id  = tag.id
   and tag.name = "interesting"
intersect
select books.* from books, taggings, tags
 where taggings.book_id = books.id
   and taggings.tag_id  = tag.id
   and tag.name = "fiction"

This seems to work, but I'm not sure how it scales, either in lines or in number of tags. That is, what happens when I add hundreds of books, hundreds of tags and thousands of tags? What happens when a search becomes “interesting” and “fictitious”, “water” and “stone”?

I have an alternative approach if there is no better way to make a query directly in SQL:

  • select all books with the first tag along with all tags of these books
  • remove any from the list in which all tags are not set
+3
5

, , .

MySQL ( , ), .

( MySQL):

SELECT books.id, books.title, books.author
FROM books
INNER JOIN taggings ON ( taggings.book_id = books.book_id )
INNER JOIN tags ON ( tags.tag_id = taggings.tag_id )
WHERE tags.name IN ( @tag1, @tag2, @tag3 )
GROUP BY books.id, books.title, books.author
HAVING COUNT(*) = @number_of_tags

:

3 , , _ 3, 3 , .

, , 10 , .

+3

" " SQL , .

select * from books, taggings tg1, tags t1, taggings tg2, tags t2 
 where tg1.book_id = books.id
   and tg1.tag_id  = t1.id
   and t1.name = 'interesting'
   and tg2.book_id = books.id
   and tg2.tag_id  = t2.id
   and t2.name = 'fiction'

EDIT: , , . exists :

select * from books
 where exists (select * from taggings, tags
                where tags.name = 'fiction'
                  and taggings.tag_id = tags.id
                  and taggings.book_id = books.id)
   and exists (select * from taggings, tags
                where tags.name = 'interesting'
                  and taggings.tag_id = tags.id
                  and taggings.book_id = books.id)
+1

ALL , , mysql , , .

select books.* from books, taggings, tags
 where taggings.book_id = books.id
   and taggings.tag_id  = tag.id
   and tag.name ALL("interesting", "fiction");

, , , , /​​, taggings.tag_id ALL ( 3, 7, 105) - . , , , , 1k-, .

, - . , . , .

+1
with
  tt as
  (
      select id
      from tags
      where name in ('interesting', 'fiction')
  ),
  mm as
  (
      select book_id
      from taggings join tt on taggings.tag_id = tt.id
      group by taggings.book_id having count(*) = 2
  )
select books.*
from books join mm on books.id = mm.book_id

, -, ( , Oracle), ( EXPLAIN PLAN):

  • tags taggings - table-to-index. , .

  • books. , , .

+1

What database? This will slightly change the answer. For example, this works with the sql server and should be faster because it eliminates the need to go to the tag table twice, but will not work in mysql because mysql does not execute CTE:

WITH taggingNames
AS
(
    SELECT tag.Name, tag.tag_id, tagging.book_id
    FROM tags
    INNER JOIN taggings ON tags.tag_id = taggings.tagid
) 
SELECT b.* 
FROM books b
INNER JOIN (
  SELECT t1.book_id
   FROM taggingNames 
   INNER JOIN taggingNames t2 ON t2.book_id = t1.book_id AND t2.Name='fiction'
   WHERE t1.Name='interesting' 
   GROUP BY t1.book_id
 ) ids ON b.book_id = ids.book_id

The thought now that I see it, I like how Peter Lang responds too.

0
source

Source: https://habr.com/ru/post/1728373/


All Articles