Last year, I was working on a Christmas project that allowed customers to send emails to each other with a 256-character free text field for Christmas requests. The project worked by searching in a very large database of products for product offers that corresponded to the text field, but offered a free text option for those customers who could not find the product in question.
One obvious problem was that clients could send fairly explicit requests to someone who did not suspect the client, with the company brand sitting around him.
In the end, the project did not move forward in different ways, for various reasons, the aspect of profanity is one.
However, I returned to thinking about the project and wondered what types of validation can be used here. I know that clbuttic, which I know, is the standard answer to any question of this nature.
The solutions that I reviewed were:
- Run it through something like WebPurify
- Use MechanicalTurk
- Write a regular expression pattern that searches for a word in the list. A more complex version of this question will take into account the plurals and past times of the word.
- Write an array of suspicious words and write them down. If the pitch goes above the mark, the check is not performed.
So there are two questions :
- If the feed fails, how do you deal with it in terms of user interface?
- What are the advantages and disadvantages of these solutions or any others that you can offer?
NB - responses such as "profanity filters are evil" do not matter. In this semi-hypothetical situation, I did not decide to introduce a filter of profanity or got a choice: to implement or not. I just have to do everything in my power with my programming skills (which, if possible, should be on the LAMP stack).
source share