Well, I have to say that I am disappointed - I was hoping for some creative ideas. Here I found the perfect solution. http://www.kloth.net/internet/bottrap.php
<html> <head><title> </title></head> <body> <p>There is nothing here to see. So what are you doing here ?</p> <p><a href="http://your.domain.tld/">Go home.</a></p> <?php if (preg_match("/10\.22\.33\.44/",$_SERVER['REMOTE_ADDR'])) { exit; } if (preg_match("Super Tool",$_SERVER['HTTP_USER_AGENT'])) { exit; } $badbot = 0; $filename = "../blacklist.dat"; $fp = fopen($filename, "r") or die ("Error opening file ... <br>\n"); while ($line = fgets($fp,255)) { $u = explode(" ",$line); $u0 = $u[0]; if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;} } fclose($fp); if ($badbot == 0) { $tmestamp = time(); $datum = date("Ymd (D) H:i:s",$tmestamp); $from = " badbot-watch@domain.tld "; $to = " hostmaster@domain.tld "; $subject = "domain-tld alert: bad robot"; $msg = "A bad robot hit $_SERVER['REQUEST_URI'] $datum \n"; $msg .= "address is $_SERVER['REMOTE_ADDR'], agent is $_SERVER['HTTP_USER_AGENT']\n"; mail($to, $subject, $msg, "From: $from"); $fp = fopen($filename,'a+'); fwrite($fp,"$_SERVER['REMOTE_ADDR'] - - [$datum] \"$_SERVER['REQUEST_METHOD'] $_SERVER['REQUEST_URI'] $_SERVER['SERVER_PROTOCOL']\" $_SERVER['HTTP_REFERER'] $_SERVER['HTTP_USER_AGENT']\n"); fclose($fp); } ?> </body> </html>
Then, to protect the pages, discard <?php include($DOCUMENT_ROOT . "/blacklist.php"); ?>
<?php include($DOCUMENT_ROOT . "/blacklist.php"); ?>
in the first line of each page. blacklist.php
contains:
<?php $badbot = 0; $filename = "../blacklist.dat"; $fp = fopen($filename, "r") or die ("Error opening file ... <br>\n"); while ($line = fgets($fp,255)) { $u = explode(" ",$line); $u0 = $u[0]; if (preg_match("/$u0/",$_SERVER['REMOTE_ADDR'])) {$badbot++;} } fclose($fp); if ($badbot > 0) { sleep(12); print ("<html><head>\n"); print ("<title>Site unavailable, sorry</title>\n"); print ("</head><body>\n"); print ("<center><h1>Welcome ...</h1></center>\n"); print ("<p><center>Unfortunately, due to abuse, this site is temporarily not available ...</center></p>\n"); print ("<p><center>If you feel this in error, send a mail to the hostmaster at this site,<br> if you are an anti-social ill-behaving SPAM-bot, then just go away.</center></p>\n"); print ("</body></html>\n"); exit; } ?>
I plan to take Scott Chamberlain's advice and be safe. I plan to implement Captcha on a script. If the user answers correctly, then he simply die
or redirects back to the root site. Just for fun, I drop the trap into a directory called /admin/
and add Disallow: /admin/
to the robots.txt file.
EDIT: Also, I redirect the bot, ignoring the rules to this page: http://www.seastory.us/bot_this.htm p>