Speeding up internal joins and subqueries while limiting row size and table membership

Question

Speeding up internal joins and subqueries while limiting row size and table membership

I am developing an rss feed reader that uses a Bayesian filter to filter boring blog posts.

The Stream column should act as a FIFO buffer from which the webapp will consume "records". I use it to store temporary relationships between records, users, and Bayesian filter classifications.

After the user marks the record as read, it will be added to the metadata table (so that the user does not present the material that they have already read) and deleted from the stream table. Every three minutes, the background process will populate the Stream table with new entries (i.e., whenever the daemon adds new entries after checking the rss channels for updates).

Problem: The request I came across is slow. More importantly, the Stream table only needs a hundred unread entries at a time; this will reduce duplication, speed up processing, and give me some flexibility regarding how I display posts.

Request (takes about 9 seconds on 3600 elements without indexes):

insert into stream (entry_id, user_id) 
select entries.id, subscriptions_users.user_id 
 from entries 
inner join subscriptions_users on subscriptions_users.subscription_id = entries.subscription_id 
where subscriptions_users.user_id = 1 
  and entries.id not in (select entry_id 
                           from metadata 
                          where metadata.user_id = 1) 
  and entries.id not in (select entry_id 
                          from stream where user_id = 1);

: (subscriptions_users), (.. ) .

: 100 , 100 , ( ),

, , .

- (nosql?) ?

+3

sql database join mysql subquery

phillmv 13 . '10 23:31

3

( 9 3600 ):

...

LEFT JOIN NULL ( )

SELECT *
FROM TABLEA A LEFT JOIN
    TABLEB B ON A.ID = B. ID
WHERE B.ID IS NULL

+1

Adriaan Stander 13 . '10 23:36

- .

- :

select entries.id, subscriptions_users.user_id
from entries 
inner join subscriptions_users on subscriptions_users.subscription_id = entries.subscription_id 
left join metadata  md on (user_id,entry_id)
left join stream  str on (user_id, entry_id) 
where subscriptions_users.user_id = 1 and where md.user_id is null and str.user_id is null;

, . , , .

, .

0

Anatoly Fayngelerin 13 . '10 23:45

OMG Ponies · Accepted Answer · 2010-03-13T23:47:34+0000

:

INSERT INTO STREAM 
  (entry_id, user_id) 
   SELECT e.id, 
          su.user_id 
     FROM ENTRIES e
     JOIN SUBSCRIPTIONS_USERS su ON su.subscription_id = e.subscription_id 
                                AND su.user_id = 1 
LEFT JOIN METADATA md ON md.entry_id = e.id
                     AND md.user_id = 1
LEFT JOIN STREAM s ON s.entry_id = e.id
                  AND s.user_id = 1
    WHERE md.entry_id IS NULL
      AND s.entry_id IS NULL

MySQL LEFT JOIN/IS NULL , , .

.

Postgres:

NOT IN
NOT EXISTS
LEFT JOIN / IS NULL

... .

Speeding up internal joins and subqueries while limiting row size and table membership

More articles: