Design pattern for blocking inappropriate content

Last year, I was working on a Christmas project that allowed customers to send emails to each other with a 256-character free text field for Christmas requests. The project worked by searching in a very large database of products for product offers that corresponded to the text field, but offered a free text option for those customers who could not find the product in question.

One obvious problem was that clients could send fairly explicit requests to someone who did not suspect the client, with the company brand sitting around him.

In the end, the project did not move forward in different ways, for various reasons, the aspect of profanity is one.

However, I returned to thinking about the project and wondered what types of validation can be used here. I know that clbuttic, which I know, is the standard answer to any question of this nature.

The solutions that I reviewed were:

  • Run it through something like WebPurify
  • Use MechanicalTurk
  • Write a regular expression pattern that searches for a word in the list. A more complex version of this question will take into account the plurals and past times of the word.
  • Write an array of suspicious words and write them down. If the pitch goes above the mark, the check is not performed.

So there are two questions :

  • If the feed fails, how do you deal with it in terms of user interface?
  • What are the advantages and disadvantages of these solutions or any others that you can offer?

NB - responses such as "profanity filters are evil" do not matter. In this semi-hypothetical situation, I did not decide to introduce a filter of profanity or got a choice: to implement or not. I just have to do everything in my power with my programming skills (which, if possible, should be on the LAMP stack).

+6
source share
3 answers

Have you thought about Bayesian filtering? Bayesian filtering is not just for spam detection. You can train them in various text recognition tasks. Take the Bayesian filter, collect a bunch of query texts and start marking them as containing profanity or not. After some time (how much time depends on the amount and type of training data), your filter will be able to detect queries containing profanity from those that do not contain profanity.

This is not flawless, but it is much better than just string matching and trying to deal with clbuttic issues. You have many features for Bayesian filtering in PHP.

Bogofilter

Bogofilter is a standalone Bayesian filter that runs on any unix-y OS. It is designed to filter email, but you can train it with any text. I have successfully used this to implement a special comment spam filter for my own site ( source ). You can interact with bogofilter as you can with any other command line application. See Source Code Link for an example.

Roll your own

If you like the call, you can fully implement the Bayesian filter from scratch. Here's a decent article on implementing a Bayesian filter in PHP .

Existing PHP Libraries

(Ab) use an existing email filter

You can use the standard SpamAssassin or DSpam installation and train it to recognize profanity. Just make sure that you turn off the options specifically designed for e-mail messages (for example, parsing mime blocks, reading headers) and just turn on the options that relate to Bayesian text processing. DSpam is easier to adapt. The advantage of SpamAssassin is that you can add your own rules on top of the Bayesian filter. For SpamAssassin, be sure to disable all default rules and write your own rules. The default rules are aimed at detecting spam email.

+6
source

In the past, I used the famous str_replace form. Here is my rationale:

  • Profile words can afford to be replaced with silly words, conveying the starting point of the message, but preventing the use of profanity
  • In successful messages where the filtering was carried out, users were shown a message about successful completion, but there was a notification that the sanitation was carried out (something like: "Your post has been added, pot".)
  • I never lost sight to fail. Messages were either uncensored or censored. In your case, you can completely block the profane.

For what it's worth, Apple has just recently stopped banning obscene language in its free laser engravings. Perhaps they had a reasonable justification?

0
source

How to use several rules for matching strings and embed them only in the moderation queue?

It seems that many queries cannot use the free text field so that they can pass safely.

Then only a small percentage should turn off your line matches so that they end in moderation. Even with a large user base, this should reduce moderation time to a minimum. You can even make obvious profanity, such as f or n word automatic, to further reduce the remaining list.

Make your moderation page easy to use and highlight the words marked with messages that should make it a quick process to scan and clear. Adjust if necessary if people try to lay out too much garbage or too many false positives.

Or just use this strategy with Bayesian filtering, for example, suggested by @Sander.

Edit: also the “Report abuse” button will help you find out if bad material is passing, but it will require saving sent messages for a while, and it may not be ideal if it is very active.

0
source

Source: https://habr.com/ru/post/886630/


All Articles