Server-side ad filtering

I am working on a web application where I display HTML from other sites. Before displaying the final version, I would like to get rid of the ads.

Any ideas, suggestions on how to do this? it should not be a super-efficient filtering tool, I thought of porting the filters part defined by adblockplus to Ruby and returning the processed document using Nokogiri .

Let's say I use ad super-permutation filter. This is not an official ad unit, but for simplicity I will use it here. Then the idea would be to remove all elements for which any of the attributes matches the filter, for example: src="http://ad.foo.com?my-ad.gif" href="http://ad.foo.com" class="annoying-ad" etc.

The Nokogiri command for this filter will be:

 doc.xpath("//*[@*[contains(., 'ad')]]").each { |element| element.remove } 

I applied a filter to this page :

original

And the result:

filtered

Not so bad, note that the global substitution filter also got rid of valid elements such as headers because they have attributes like id="masthead" .

So, I think this approach is suitable for my case, now the question will be , which filters to use? they have a huge list of filters, and I don’t want them to be repeated. I'm going to go in for the top 10-20 and analyze the documents based on this, is there a list with the most popular? If so, I could not find him.

Thanks!

+6
source share

Source: https://habr.com/ru/post/952952/


All Articles