How can I detect a scanner / spider using PHP?

How can I detect a crawler / spider using PHP?

I am currently working on a project where I need to monitor every visit to the crawler.
I know that you should use HTTP_USER_AGENT, but I'm not sure how to format the code for this purpose, and I know that the CUSTOM agent can be changed very simply, so I would also like to know if a few more options can be added to avoid spoofing ?

Sample code of what I'm trying to do.

<?php $user_agent = $_SERVER['HTTP_USER_AGENT']; if (strpos( $user_agent, 'Google') !== false) { echo "Googlebot is here"; } ?> 

thanks

+6
source share
1 answer

According to Googlebot Check :

You can verify that the bot accessing your server is actually a Googlebot (or other Google user agent) using a reverse DNS lookup, checking that the name is in googlebot.com and then doing a direct DNS lookup using This is the name googlebot. This is useful if you are concerned that spammers or other troublemakers access your site claiming to be a Googlebot.

For instance:

host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer
crawl-66-249-66-1.googlebot.com.

host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
Google does not publish a public list of IP addresses for webmasters in the white list. This is due to the fact that these ranges of IP addresses can change, which creates problems for any webmasters who have hardcoded them. The best way to identify Googlebot hits is to use a user agent (Googlebot).

You can do a reverse DNS lookup:

 function validateGoogleBotIP($ip) { $hostname = gethostbyaddr($ip); //"crawl-66-249-66-1.googlebot.com" return preg_match('/\.googlebot\.com$/i', $hostname); } if (strpos($_SERVER['HTTP_USER_AGENT'], 'Google') !== false) { if (validateGoogleBotIP($_SERVER['REMOTE_ADDR'])) { echo 'It is ACTUALLY google'; } else { echo 'Someone\ faking it!'; } } else { echo 'Nothing to do with Google'; } 
+10
source

Source: https://habr.com/ru/post/958075/


All Articles