As the name implies, I have a problem with the implementation of the algorithm of related articles. Let me start by listing the tables from the database:
[articles] id_article id_category name content publish_date is_deleted [categories] id_category id_parent name [tags_to_articles] id_tag id_article [tags] id_tag name [articles_to_authors] id_article id_author [authors] id_author name is_deleted [related_articles] id_article_left id_article_right related_score
Algorithm
Every other table, except for those associated with it, has data in it. Now I want to fill out related_articles with ratings between articles (very important: the table will work as a directed graph, the rating of article A with article B may differ from the rating between B and A, see list). The score is calculated as follows:
- If the two articles in question have the same category, the number (x) is added to the score.
- for each author with whom they have in common, the number (y) is added to the account
- for each tag that they have, a number (z) is added to the score
- if we calculate the score of article A with article B, the difference between now () and publish_date of article B will generate the number (t) that will be subtracted from the score
My first (ineffective) approach
I tried to make a request like this:
SELECT a.id, b.id, a.id_category, a.publish_date, b.id_category, b.publish_date, c.id_tag, e.id_author FROM `articles` a, articles b, tags_to_articles c, tags_to_articles d, articles_to_authors e, articles_to_authors f WHERE a.id_article <> b.id_article AND ( (a.id_article=c.id_article and c.id_tag=d.id_tag and d.id_article=b.id_article) OR (a.id=e.id_article and e.id_author=f.id_author and f.id_article=b.id_article) OR (a.id_category=b.id_category) )
In theory, this will list each element that is worth computing for evaluation. However, this takes too much time and resources.
Is there another way? I am also open to setting up an algorithm or tables if it gets a workable solution. It is also worth noting that counter calculations are performed in cron, of course, I do not expect this to be done on every page request.