I am trying to determine the best general approach for querying joined two tables with lots of data, where each table has a column in the where clause. Imagine a simple diagram with two tables:
posts
id (int)
blog_id (int)
published_date (datetime)
title (varchar)
body (text)
posts_tags
post_id (int)
tag_id (int)
With the following indices:
posts: [blog_id, published_date]
tags: [tag_id, post_id]
We want to CHOOSE the 10 most recent posts on this blog, marked as "foo". For the sake of this discussion, suppose there are 10 million posts on the blog, and 1 million of them are marked as βfooβ. What is the most efficient way to request this data?
A naive approach would be this:
SELECT
id, blog_id, published_date, title, body
FROM
posts p
INNER JOIN
posts_tags pt
ON pt.post_id = p.id
WHERE
p.blog_id = 1
AND pt.tag_id = 1
ORDER BY
p.published_date DESC
LIMIT 10
MySQL will use our indexes, but it will end up scanning millions of records anyway. Is there a more efficient way to get this data without denormalizing the circuit?