I want to crawl some site on a special domain medium.com. (e.g. https://uber-developers.news/ )
These sites are always redirected to "medium.com" and returned to the site. But the problem here is the .com redirected URL is denied by its robots.txt.
Here is the redirection path.
https://uber-developers.news/
https://medium.com/m/global-identity?redirectUrl=https://uber-developers.news/
https://uber-developers.news/?gi=e0f8caa9844c
The problem is above the second URL https://medium.com/m/global-identity?redirectUrl=https://uber-developers.news/ "prohibited by robots.txt
https://medium.com/robots.txt
User-Agent: * Disallow: /m/ Disallow: /me/ Disallow: /@me$ Disallow: /@me/ Disallow: /*/*/edit Allow: /_/ Allow: /_/api/users/*/meta Allow: /_/api/users/*/profile/stream Allow: /_/api/posts/*/responses Allow: /_/api/posts/*/responsesStream Allow: /_/api/posts/*/related Sitemap: https://medium.com/sitemap/sitemap.xml
robots.txt URL?
.
robot.txt , , , , -. Medium .
robot.txt
( CURL, , ), . , Medium .
Source: https://habr.com/ru/post/1688565/More articles:java.util.ArrayList.throwIndexOutOfBoundsException (ArrayList.java:255) Error - javaCall Enzyme.configure in the jasmine helper - reactjsHow to disable Javascript in Chrome (-headless) using Webdriver - javascriptHow to disable JavaScript in a browser using Selenium (Java)? - javaAdd MVC controller with views using Entity Framework (2.0) - asp.net-mvcHow to get the base value from Swift KeyPath when its type is KeyPath ? - swiftBest practice for using execvp in C ++ - c ++How to divide a weighted cyclic graph into n graphs, breaking as few connections as possible - javaHow to handle a RuntimeException in a thread - javagroovy Ρ ΡΠ΅Π³ΡΠ»ΡΡΠ½ΡΠΌ Π²ΡΡΠ°ΠΆΠ΅Π½ΠΈΠ΅ΠΌ - regexAll Articles