Should I consider a robots.txt file when a URL is redirected to another domain?

Question

Should I consider a robots.txt file when a URL is redirected to another domain?

I want to crawl some site on a special domain medium.com. (e.g. https://uber-developers.news/ )

These sites are always redirected to "medium.com" and returned to the site. But the problem here is the .com redirected URL is denied by its robots.txt.

Here is the redirection path.

The problem is above the second URL https://medium.com/m/global-identity?redirectUrl=https://uber-developers.news/ "prohibited by robots.txt

https://medium.com/robots.txt

User-Agent: *
Disallow: /m/
Disallow: /me/
Disallow: /@me$
Disallow: /@me/
Disallow: /*/*/edit
Allow: /_/
Allow: /_/api/users/*/meta
Allow: /_/api/users/*/profile/stream
Allow: /_/api/posts/*/responses
Allow: /_/api/posts/*/responsesStream
Allow: /_/api/posts/*/related
Sitemap: https://medium.com/sitemap/sitemap.xml

robots.txt URL?

.

+4

redirect http web-crawler robots.txt

bruce 02 . '17 5:07

1

JVerstry · Answer 1 · 2018-01-20T22:14:27+0000

robot.txt , , , , -. Medium .

( CURL, , ), . , Medium .

Should I consider a robots.txt file when a URL is redirected to another domain?

More articles: