Detecting Spammers Using MySQL

I see that more and more users are subscribing to my site just to send duplicate SPAM messages to other users. I added server-side code to detect duplicate messages with the following mysql query:

SELECT count(content) as msgs_sent FROM messages WHERE sender_id = '.$sender_id.' GROUP BY content having count(content) > 10 

The request works well, but now they are bypassed by modifying several charctersr in their posts. Is there a way to detect this using MySQL, or do I need to look at each group returned from MySQL, and then use PHP to determine the percentage of similarity?

Any thoughts or suggestions?

+4
source share
1 answer

Full text

You can look at the implementation of something similar to the MATCH example here :

 mysql> SELECT id, body, MATCH (title,body) AGAINST -> ('Security implications of running MySQL as root') AS score -> FROM articles WHERE MATCH (title,body) AGAINST -> ('Security implications of running MySQL as root'); +----+-------------------------------------+-----------------+ | id | body | score | +----+-------------------------------------+-----------------+ | 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 | | 6 | When configured properly, MySQL ... | 1.3114095926285 | +----+-------------------------------------+-----------------+ 2 rows in set (0.00 sec) 

So, for your example, perhaps:

 SELECT id, MATCH (content) AGAINST ('your string') AS score FROM messages WHERE MATCH (content) AGAINST ('your string') AND score > 1; 

Note that to use these functions, the content column must be a FULLTEXT index.

What is score in this example?

This is a relevance value . It is calculated using the process described below:

Each correct word in the collection and in the request is weighted depending on its significance in the collection or request. Therefore, a word that is present in many documents has a lower weight (and may even have zero weight), since it has a lower semantic meaning in this particular collection. Conversely, if the word is rare, it gains more weight. The weight of words to calculate the relevance of the string.

On the page .

+3
source

Source: https://habr.com/ru/post/1396544/


All Articles