Detecting spiders or browsers with cookies enabled

Many spiders / crawlers visit our news site. We depend on GeoIP services to identify the physical location of our visitors and serve their related content. So, we have developed a module with a functionmodule_init()which sends IP MaxMind and sets cookies with location information. To avoid sending requests to each page view, we first check whether a cookie is set, and if not, we send information and set a cookie. This works great with regular clients, but doesn't work when the spider crawls the site. Each pageview requests a MaxMind request, and this action becomes somewhat expensive. We are looking for a solution to identify scanners or, if simpler, legitimate browsers with cookies turned on and request MaxMind only when it is useful.

+3
source share
3 answers

, , . , , . script, . db . , , , " " script.

- :

ip

, , , , , , . , , ip-, , - , , , , .

, , , , .

, 100% , , , , . 99% ligit- :

$_SERVER['HTTP_USER_AGENT'] = 'Googlebot', 'Yammybot', 'Openbot', 'Yahoo'... etc.

, , IE6 - .

, , , , , , 100%, , , . % , 100%, , 1%, lol.

+3

, , ?

+1

Web browsers (both legitimate and vile) can be detected using the ATL web browser API at www.atlbl.com

0
source

Source: https://habr.com/ru/post/1760806/


All Articles