I have a fairly simple process that periodically pulls RSS feeds and updates articles in a MySQL database.
The table of articles is filled up to about 130 thousand lines right now. For each article found, the processor checks if the article exists. These requests almost always take 300 milliseconds, and approximately every 10 or 20 attempts, they take more than 2 seconds.
SELECT id FROM `articles` WHERE (guid = 'http://example.com/feed.rss') LIMIT 1;
I have an index in the guid column, but whenever a new article is found, it is added to the article table - the request cache is invalid (right?).
Some of the other fields in the slow query log log 120+ lines.
Of course, on my development machine, these queries take about 0.2 milliseconds.
The server is a virtual host from Engine Yard Solo (EC2) with 1.7 GB of memory and any EC2 CPU delivered these days.
Any advice is appreciated.
Update
As it turns out, the problem was between the chair and the keyboard.
I had an index on "id", but it requested "guid".
Adding an index to "guid" reduced the query time to 0.2 ms.
Thanks for all the helpful tips to everyone!
source
share