BOT / Spider Trap Ideas

I have a client whose domain seems to have hit hard on what DDoS seems to be. In magazines, these are regular search agents with random IP addresses, but they turn pages too quickly to be human. They also do not seem to request any images. I can’t find any picture, and my suspicion is the Windows Zombies Park.

Clients had problems in the past with SPAM attacks β€” they even had to specify MX in Postini to receive 6.7 GB / day of junk mail to stop the server.

I want to set up a BOT trap in a directory forbidden by robots.txt ... just never try anything like this before, hoping someone has creative ideas to capture BOTs!

EDIT: I already have a lot of ideas for catching him ... what to do with him when he gets trapped.

+5
source share
5 answers

You can customize a PHP script whose URL is explicitly forbidden by robots.txt. In this script, you can pull out the IP address of the suspected bot that got to you (via $ _SERVER ['REMOTE_ADDR']), and then add this IP address to the database blacklist table.

Then, in the main application, you can check the source IP address, search for this IP address in the blacklist table, and if you find it, throw 403 page instead. (Perhaps with a message such as "We found that the abuse is happening from your IP address, if you think this is a mistake, please contact us at ...")

At the top, you get an automatic black list of bad bots. On the other hand, it is not very effective, and it can be dangerous. (One person innocently checking this page out of curiosity may ban a large number of users.)

Edit: Alternatively (or, moreover, I suppose) you can simply add GeoIP to your application and reject by country of origin.

+6
source

What you can do is get another box (a kind of sacrificial lamb) not on the same tube as your main host, and then on this node there will be a page that redirects to itself (but with a randomized page name in the URL address). this can lead to the bot stuck in an endless loop, connecting the processor and passing it to the sacrificial lamb, but not to your main box.

+1
source

I tend to think that the problem is better solved with network security rather than coding, but I see the logic in your approach / question.

There are a number of questions and discussions about this on the server error that may be useful for investigation.

https://serverfault.com/search?q=block+bots

+1
source

Well, I have to say that I am disappointed - I was hoping for some creative ideas. Here I found the perfect solution. http://www.kloth.net/internet/bottrap.php

<html> <head><title> </title></head> <body> <p>There is nothing here to see. So what are you doing here ?</p> <p><a href="http://your.domain.tld/">Go home.</a></p> <?php /* whitelist: end processing end exit */ if (preg_match("/10\.22\.33\.44/",$_SERVER['REMOTE_ADDR'])) { exit; } if (preg_match("Super Tool",$_SERVER['HTTP_USER_AGENT'])) { exit; } /* end of whitelist */ $badbot = 0; /* scan the blacklist.dat file for addresses of SPAM robots to prevent filling it up with duplicates */ $filename = "../blacklist.dat"; $fp = fopen($filename, "r") or die ("Error opening file ... <br>\n"); while ($line = fgets($fp,255)) { $u = explode(" ",$line); $u0 = $u[0]; if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;} } fclose($fp); if ($badbot == 0) { /* we just see a new bad bot not yet listed ! */ /* send a mail to hostmaster */ $tmestamp = time(); $datum = date("Ymd (D) H:i:s",$tmestamp); $from = " badbot-watch@domain.tld "; $to = " hostmaster@domain.tld "; $subject = "domain-tld alert: bad robot"; $msg = "A bad robot hit $_SERVER['REQUEST_URI'] $datum \n"; $msg .= "address is $_SERVER['REMOTE_ADDR'], agent is $_SERVER['HTTP_USER_AGENT']\n"; mail($to, $subject, $msg, "From: $from"); /* append bad bot address data to blacklist log file: */ $fp = fopen($filename,'a+'); fwrite($fp,"$_SERVER['REMOTE_ADDR'] - - [$datum] \"$_SERVER['REQUEST_METHOD'] $_SERVER['REQUEST_URI'] $_SERVER['SERVER_PROTOCOL']\" $_SERVER['HTTP_REFERER'] $_SERVER['HTTP_USER_AGENT']\n"); fclose($fp); } ?> </body> </html> 

Then, to protect the pages, discard <?php include($DOCUMENT_ROOT . "/blacklist.php"); ?> <?php include($DOCUMENT_ROOT . "/blacklist.php"); ?> in the first line of each page. blacklist.php contains:

 <?php $badbot = 0; /* look for the IP address in the blacklist file */ $filename = "../blacklist.dat"; $fp = fopen($filename, "r") or die ("Error opening file ... <br>\n"); while ($line = fgets($fp,255)) { $u = explode(" ",$line); $u0 = $u[0]; if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;} } fclose($fp); if ($badbot > 0) { /* this is a bad bot, reject it */ sleep(12); print ("<html><head>\n"); print ("<title>Site unavailable, sorry</title>\n"); print ("</head><body>\n"); print ("<center><h1>Welcome ...</h1></center>\n"); print ("<p><center>Unfortunately, due to abuse, this site is temporarily not available ...</center></p>\n"); print ("<p><center>If you feel this in error, send a mail to the hostmaster at this site,<br> if you are an anti-social ill-behaving SPAM-bot, then just go away.</center></p>\n"); print ("</body></html>\n"); exit; } ?> 

I plan to take Scott Chamberlain's advice and be safe. I plan to implement Captcha on a script. If the user answers correctly, then he simply die or redirects back to the root site. Just for fun, I drop the trap into a directory called /admin/ and add Disallow: /admin/ to the robots.txt file.

EDIT: Also, I redirect the bot, ignoring the rules to this page: http://www.seastory.us/bot_this.htm p>

+1
source

You can first see where the ip comes from. I assume that they all come from the same country, for example, from China or Nigeria, in which case you can configure something in htaccess to ban all ip from these two countries, since I have to create a bot trap no idea

0
source

Source: https://habr.com/ru/post/916386/


All Articles