Canonical connection as a way to deal with scrapers?

Let's say several external sites scrape / collect your content and publish it as their own. Let me also say that you maintain a single unique / constant URL for each piece of content so that smoothing content (on your site) is never a problem.

Is there any value from an SEO point of view to include a canonical link in your heading anyway, so that when your site is “scraped”, the canonical indication is entered into any site, stealing your content (provided that they collect raw HTML code and not go through RSS, etc.)?

I heard different things about the behavior of cross-site canonical links, from “ignoring them” to “undefined behavior” so that “this does not stop” “making sure that exactly what the canonical is for”. My impression was that the canonical approach was a good way to interact with an intra-site, but not necessarily an intersite alias.

+4
source share
1 answer

I can not answer your question directly.

You (someone in your company) should contact parties that syndicate your content without permission, and try to get them to do so with permission. You should clarify your policy regarding unauthorized syndication. This is, of course, a business decision, and your business development / processing people and IP lawyers will probably need to get involved.

If they persistently continue to do this, and you absolutely need to make them stop, you can start serving junk to your robots. Detecting their robots can be nontrivial, as they are likely to create a “real” user agent header and use different IP addresses (most cybercriminals seem to use EC2 these days), however, if you succeed, their websites will become full of trash.

Once their websites are full of garbage (or worse), you can contact them again by asking them if they want to stop their unpleasant behavior.

+4
source

Source: https://habr.com/ru/post/1305383/


All Articles