What is a good open source package for creating flexible spam detection on a large Rails site?

Question

What is a good open source package for creating flexible spam detection on a large Rails site?

My site is getting bigger and it is starting to attract a lot of spam on various channels. The site has many different types of UGC (profiles, forums, blog comments, status updates, private messages, etc.). I have various mitigation efforts that I hope to deploy in a blitzkrieg style to convince spammers that we are not worth it. I have high confidence that I make functionality wise, but one missing piece kills all the old spam right away.

Here is what I have:

Big good / bad bodies (5-figure is bad, 6 or 7-figure is good). A lot of spam has very reliable fingerprints, and the fact that I kind of ignored it for 6 months helps :)
A large, modular Rails site deployed in AWS . This is not a huge traffic site, but we are launching 8 instances with the start of SOA.
Ruby, Redis, Resque, MySQL, Varnish, Nginx, Unicorn, Chef, all on Gentoo

My requirements:

I want him to handle the amount of data well enough (so I fear a clean ruby solution).
I must be able to train multiple classifications for different types of content (419 botnet spam scams)
I would like to be able to add manual factors based on our own detective work (pattern matching, IP reuse, etc.).
Ultimately, I want to create a nice interface that will be used with Ruby. If this requires my hands to be dirty in C or something else, I can handle it, but I will avoid it if I can.

I understand that this is a long and vague question, but first of all I am looking only for a list of good packages and, secondly, any random thoughts from someone who created a similar system about ways to get closer to it.

+6

linux ruby soa spam bayesian

gtd Jun 03 '11 at 21:37

source share

1 answer

Mori · Accepted Answer · 2011-06-03T21:58:33+0000

We searched for an acceptable open source solution and did not find it.

If you come to the same conclusion and decide to consider proprietary antispam, look at Akismet's paid joint spam filtering service. We had decent performance on dozens of medium-sized sites. It integrates with rails through rack and rackismet .

What is a good open source package for creating flexible spam detection on a large Rails site?

More articles: