Blacklist words in content for filtering messages

For a website that enters data from children, we need to filter out any mischievous / bad words that they use when they enter their comments on the website (PHP works).

Comments is a free field and users can enter any comments they want. The solution I can think of is to have a list of words, for example BLACKLIST: bad, bad, word, woord, craap, craaaap, (we can fill it with all the blacklisted words).

Then, when the form is saved, we can view the list and, if any of the words is present, we will not allow the comment to be saved.

BUT THE EXAMPLE with this method is that they can be dispensed by adding letters to the words to skip the EG filter: shiiiiit

Let me know what you think is the best way to create a filter for these words.

+1
source share
6 answers

You can never filter every permutation. Perhaps the most appropriate solution is to filter the obvious and introduce a "Report Abuse" mechanism so that someone can manually view (and reject) suspicious comments.

+6
source

So, are you going to ban shit, shït, shıt, śhit and śhiŧ?

Blacklisting is not a viable solution in the Unicode era. However, a ban on what seems excessive.

+5
source

If you have enough time, it is worth reading about the Scunthorpe problem .

Jeff Atwood has a message about the futility of obscene filters .

+4
source

Thanks to too many php, I found some links that might be the solution for your case:

+1
source

Use uClassify to train bad comments, when the system is well trained, you can mark offensive comments for moderation.

0
source

It is also possible to filter a word such as "bass", which, of course, includes one of the words that is not allowed. At the moment, some good moderators seem to be the best solution to this problem.

-1
source

Source: https://habr.com/ru/post/919557/


All Articles