In MySQL, how to connect to two very large tables that have columns in WHERE state?

Question

In MySQL, how to connect to two very large tables that have columns in WHERE state?

I am trying to determine the best general approach for querying joined two tables with lots of data, where each table has a column in the where clause. Imagine a simple diagram with two tables:

posts
 id (int)
 blog_id (int)
 published_date (datetime)
 title (varchar)
 body (text)

posts_tags 
 post_id (int)
 tag_id (int)

With the following indices:

posts: [blog_id, published_date]
tags: [tag_id, post_id]

We want to CHOOSE the 10 most recent posts on this blog, marked as "foo". For the sake of this discussion, suppose there are 10 million posts on the blog, and 1 million of them are marked as “foo”. What is the most efficient way to request this data?

A naive approach would be this:

 SELECT 
  id, blog_id, published_date, title, body
 FROM 
  posts p
 INNER JOIN
  posts_tags pt 
  ON pt.post_id = p.id
 WHERE
  p.blog_id = 1
  AND pt.tag_id = 1
 ORDER BY
  p.published_date DESC
 LIMIT 10

MySQL will use our indexes, but it will end up scanning millions of records anyway. Is there a more efficient way to get this data without denormalizing the circuit?

+3

optimization join mysql

Newt 07 . '10 21:32

4

, , . WHERE , . , , MySQL .

FROM 
posts p
INNER JOIN
posts_tags pt 
ON pt.post_id = p.id
    AND pt.tag_id = 1
WHERE
p.blog_id = 1

+3

Brent Baisley 08 . '10 0:51

, , :

:

create table posts_tags
(
blog_id int unsigned not null, -- denormalise
tag_id smallint unsigned not null,
post_id int unsigned not null,
primary key(blog_id, tag_id, post_id) -- clustered composite PK
)
engine=innodb;

:

delimiter #

create trigger posts_tags_before_ins_trig before insert on posts_tags
for each row
proc_main:begin

declare b_id int unsigned default 0;

   select blog_id into b_id from posts where post_id = new.post_id;

   set new.blog_id = b_id;

end proc_main #

delimiter ;

: (, posts.post_id auto_increment PK)

delimiter ;

drop procedure if exists get_latest_blog_posts_by_tag;

delimiter #

create procedure get_latest_blog_posts_by_tag
(
in p_blog_id int unsigned,
in p_tag_id smallint unsigned
)
proc_main:begin

  select
    p.*
  from
    posts p
  inner join 
  (
    select distinct
      pt.post_id
    from
      posts_tags pt
    where
      pt.blog_id = p_blog_id and pt.tag_id = p_tag_id
    order by
      pt.post_id desc limit 10
  ) rp on p.post_id = rp.post_id
  order by
    p.post_id desc;

end proc_main #

delimiter ;

call get_latest_blog_posts_by_tag(1,1);

+2

Jon Black 07 . '10 23:00

, , . , , . , , .

blod_id tag_id ON, . , - , .

, . - LastName, FirstName, FirstName, LastName.

It’s hard to sit down and say deterministically what will work best without experimenting. I usually do these things through experimentation and benchmarking. Sometimes I find that the results are contrary to what I expected based on the documentation, and then delved into it to understand that there are some subtle behavioral features / features that I did not use for a particular situation.

0

AaronLS Sep 7 '10 at 22:17

source share

Mark Byers · Accepted Answer · 2010-09-07T22:08:13+0000

, MySQL (blog_id, published_date) , blog_id = 1, published_date. , . posts_tags. tag_id, post_id, , , . 10% foo, 100 posts , 10 .

, , foo . , - , , . 10 , , .

, , 10 , , .

, , , , , ? EXPLAIN ?

In MySQL, how to connect to two very large tables that have columns in WHERE state?

More articles: