How does a crowd fix my web crawl

My web application requires downloading content from a given user url. Currently, this request goes through my server, which is inefficient and can block the IP address of my server.

Is it possible to directly load the contents of a url? A policy of the same origin seems to prevent the use of AJAX or iframe to load and reuse this content.

Any ideas? For example, is there a way through flash to load and reuse the contents of a URL?

+3
source share
2 answers

You can use Tor to mask your requests, but if you need to go this long to crawl a website, maybe you shouldn't do this?

In addition, with your approach, the iframe request will include your page URL as a referrer, which makes identifying these requests fairly unambiguously on the server ...

+1
source

If this is a specific web side, I recommend talking to the website operators, rather than trying to anonymously scan.

0
source

Source: https://habr.com/ru/post/1720962/


All Articles