How does Googlebot know that the web server is not masking when it requests the url ?? _caped_fragment_ = `?

Regarding the Google AJAX crawl specification, if the server returns one thing (namely, a heavy JavaScript file) for URL #! and something else (namely the “html snapshot” of the page) to Googlebot when #! replaced by ?_escaped_fragment_= , which for me is like cloaking. In the end, how Googlebot is convinced that the server is returning bona fide equivalents for URL #! and ?_escaped_fragment_= . However, this is what the AJAX scanner actually tells webmasters. Am I missing something? How is Googlebot sure that the server returns the same content in both cases?

+4
source share
1 answer

The scanner does not know. But he never knows even for sites that return simple ol html - it’s very easy to write code that hides the site based on http headers used by scanners or known IP headers.

See this related question: How does Google know you're Cloaking?

Most of them seem like a hunch, but there seem to be various in-place checks, varying between cheating on normal browser headers and the actual person viewing the page.

Continuing the hypothesis, of course, it would not be beyond the ability of programmers on Google to write a crawler form that actually retrieves what the user sees - in the end, they have their own browser that does just that. It would be prohibitively expensive for the CPU to do this all the time, but it probably makes sense for a random check.

+1
source

Source: https://habr.com/ru/post/1387507/


All Articles