My site is getting bigger and it is starting to attract a lot of spam on various channels. The site has many different types of UGC (profiles, forums, blog comments, status updates, private messages, etc.). I have various mitigation efforts that I hope to deploy in a blitzkrieg style to convince spammers that we are not worth it. I have high confidence that I make functionality wise, but one missing piece kills all the old spam right away.
Here is what I have:
- Big good / bad bodies (5-figure is bad, 6 or 7-figure is good). A lot of spam has very reliable fingerprints, and the fact that I kind of ignored it for 6 months helps :)
- A large, modular Rails site deployed in AWS . This is not a huge traffic site, but we are launching 8 instances with the start of SOA.
- Ruby, Redis, Resque, MySQL, Varnish, Nginx, Unicorn, Chef, all on Gentoo
My requirements:
- I want him to handle the amount of data well enough (so I fear a clean ruby ββsolution).
- I must be able to train multiple classifications for different types of content (419 botnet spam scams)
- I would like to be able to add manual factors based on our own detective work (pattern matching, IP reuse, etc.).
- Ultimately, I want to create a nice interface that will be used with Ruby. If this requires my hands to be dirty in C or something else, I can handle it, but I will avoid it if I can.
I understand that this is a long and vague question, but first of all I am looking only for a list of good packages and, secondly, any random thoughts from someone who created a similar system about ways to get closer to it.
source share