When writing your own method, you will need to use a combination of heuristics.
For example, very often there are 2 or more URL links for spam comments.
I would start writing my filter like this, using a dictionary of trigger words and scroll it and use it to determine the probability:
function spamProbability($text){ $probability = 0; $text = strtolower($text); // lowercase it to speed up the loop $myDict = array("http","penis","pills","sale","cheapest"); foreach($myDict as $word){ $count = substr_count($text, $word); $probability += .2 * $count; } return $probability; }
Please note that this method will lead to many false positives, depending on your set of words; you could have your flag flag for moderation (but it goes straight ahead) with a probability of> .3 and <.6, require that those> .6 and <.9 enter the queue for moderation (where they are not displayed before approval), and then nothing more than> 1 is simply rejected.
Obviously, these are all the values ββthat you will need to configure for thresholds, but this should start you with a fairly simple system. You can add several other qualifiers to it to increase / decrease the likelihood of spam, such as checking the relationship of bad words to words, changing the weight of words, etc.
source share