I am developing an rss feed reader that uses a Bayesian filter to filter boring blog posts.
The Stream column should act as a FIFO buffer from which the webapp will consume "records". I use it to store temporary relationships between records, users, and Bayesian filter classifications.
After the user marks the record as read, it will be added to the metadata table (so that the user does not present the material that they have already read) and deleted from the stream table. Every three minutes, the background process will populate the Stream table with new entries (i.e., whenever the daemon adds new entries after checking the rss channels for updates).
Problem: The request I came across is slow. More importantly, the Stream table only needs a hundred unread entries at a time; this will reduce duplication, speed up processing, and give me some flexibility regarding how I display posts.
Request (takes about 9 seconds on 3600 elements without indexes):
insert into stream (entry_id, user_id)
select entries.id, subscriptions_users.user_id
from entries
inner join subscriptions_users on subscriptions_users.subscription_id = entries.subscription_id
where subscriptions_users.user_id = 1
and entries.id not in (select entry_id
from metadata
where metadata.user_id = 1)
and entries.id not in (select entry_id
from stream where user_id = 1);
: (subscriptions_users), (.. ) .
: 100 , 100 , ( ),
, , .
- (nosql?) ?