Should I consider a robots.txt file when a URL is redirected to another domain?

I want to crawl some site on a special domain medium.com. (e.g. https://uber-developers.news/ )

These sites are always redirected to "medium.com" and returned to the site. But the problem here is the .com redirected URL is denied by its robots.txt.

Here is the redirection path.

The problem is above the second URL https://medium.com/m/global-identity?redirectUrl=https://uber-developers.news/ "prohibited by robots.txt

https://medium.com/robots.txt

User-Agent: *
Disallow: /m/
Disallow: /me/
Disallow: /@me$
Disallow: /@me/
Disallow: /*/*/edit
Allow: /_/
Allow: /_/api/users/*/meta
Allow: /_/api/users/*/profile/stream
Allow: /_/api/posts/*/responses
Allow: /_/api/posts/*/responsesStream
Allow: /_/api/posts/*/related
Sitemap: https://medium.com/sitemap/sitemap.xml

robots.txt URL?

.

+4
1

robot.txt , , , , -. Medium .

( CURL, , ), . , Medium .

+1

Source: https://habr.com/ru/post/1688565/


All Articles