For a better understanding, here are the tables / models of my scrambling application in Ruby on Rails with MySQL:
- Scraper (Scraper searches this site for all keywords)
- Keyword (contains the search term and status (: search, search, search))
The system is multi-threaded, so I had to add a column for the status so that several threads would not look for the same term at the same time.
At first I had one scraper, and everything worked fine. Now a new requirement is the simultaneous execution of several scraper.
This means that one status field will not work for all clips. The very first option is to make many, many scraper-keyword relationships to track the keywords found for each scraper.
Now I have about 1 million keywords, and there are about 60-70 search sites. This means a huge table that will slow down the search process for keywords.
I am looking for the best solution that does not affect speed. I cannot upgrade to NoSQL due to some restriction from the client.
source share