In MySQL, how to connect to two very large tables that have columns in WHERE state?

I am trying to determine the best general approach for querying joined two tables with lots of data, where each table has a column in the where clause. Imagine a simple diagram with two tables:

posts
 id (int)
 blog_id (int)
 published_date (datetime)
 title (varchar)
 body (text)

posts_tags 
 post_id (int)
 tag_id (int)

With the following indices:

posts: [blog_id, published_date]
tags: [tag_id, post_id]

We want to CHOOSE the 10 most recent posts on this blog, marked as "foo". For the sake of this discussion, suppose there are 10 million posts on the blog, and 1 million of them are marked as β€œfoo”. What is the most efficient way to request this data?

A naive approach would be this:

 SELECT 
  id, blog_id, published_date, title, body
 FROM 
  posts p
 INNER JOIN
  posts_tags pt 
  ON pt.post_id = p.id
 WHERE
  p.blog_id = 1
  AND pt.tag_id = 1
 ORDER BY
  p.published_date DESC
 LIMIT 10

MySQL will use our indexes, but it will end up scanning millions of records anyway. Is there a more efficient way to get this data without denormalizing the circuit?

+3
4

, MySQL (blog_id, published_date) , blog_id = 1, published_date. , . posts_tags. tag_id, post_id, , , . 10% foo, 100 posts , 10 .

, , foo . , - , , . 10 , , .

, , 10 , , .

, , , , , ? EXPLAIN ?

+2

, , . WHERE , . , , MySQL .

FROM 
posts p
INNER JOIN
posts_tags pt 
ON pt.post_id = p.id
    AND pt.tag_id = 1
WHERE
p.blog_id = 1
+3

, , :

:

create table posts_tags
(
blog_id int unsigned not null, -- denormalise
tag_id smallint unsigned not null,
post_id int unsigned not null,
primary key(blog_id, tag_id, post_id) -- clustered composite PK
)
engine=innodb;

:

delimiter #

create trigger posts_tags_before_ins_trig before insert on posts_tags
for each row
proc_main:begin

declare b_id int unsigned default 0;

   select blog_id into b_id from posts where post_id = new.post_id;

   set new.blog_id = b_id;

end proc_main #

delimiter ;

: (, posts.post_id auto_increment PK)

delimiter ;

drop procedure if exists get_latest_blog_posts_by_tag;

delimiter #

create procedure get_latest_blog_posts_by_tag
(
in p_blog_id int unsigned,
in p_tag_id smallint unsigned
)
proc_main:begin

  select
    p.*
  from
    posts p
  inner join 
  (
    select distinct
      pt.post_id
    from
      posts_tags pt
    where
      pt.blog_id = p_blog_id and pt.tag_id = p_tag_id
    order by
      pt.post_id desc limit 10
  ) rp on p.post_id = rp.post_id
  order by
    p.post_id desc;

end proc_main #

delimiter ;

call get_latest_blog_posts_by_tag(1,1);
+2

, , . , , . , , .

blod_id tag_id ON, . , - , .

, . - LastName, FirstName, FirstName, LastName.

It’s hard to sit down and say deterministically what will work best without experimenting. I usually do these things through experimentation and benchmarking. Sometimes I find that the results are contrary to what I expected based on the documentation, and then delved into it to understand that there are some subtle behavioral features / features that I did not use for a particular situation.

0
source

Source: https://habr.com/ru/post/1763675/


All Articles