I have a database with 2,000,000 messages. When a user receives a message, I need to find relevant messages in my database based on the appearance of words.
I tried to start a batch process to summarize my database: 1 - Save all words (except a, a, an, of, for ...) of all messages. 2 - Create a link between all messages and the words contained in it (I also save the frequency of this word in the message.)
Then, when I get the message: 1 - I parse the words (this seems like the first step of my batch process). 2 - Run a query in the database to retrieve messages sorted by the numbers of matching words.
However, the process of updating the word base and the request to receive similar messages is very difficult and slow. The update of the base word lasts ~ 1.2111 seconds for a message of 3000 bytes. Requests for similar messages lasts ~ 9.8 seconds for messages with the same size.
Database setup has already been completed, and the code is working fine.
I need a better algorithm for this.
Any ideas?
source share