How does a crowd fix my web crawl

Question

How does a crowd fix my web crawl

My web application requires downloading content from a given user url. Currently, this request goes through my server, which is inefficient and can block the IP address of my server.

Is it possible to directly load the contents of a url? A policy of the same origin seems to prevent the use of AJAX or iframe to load and reuse this content.

Any ideas? For example, is there a way through flash to load and reuse the contents of a URL?

+3

ajax same-origin-policy iframe crowdsourcing

hoju Oct 24 '09 at 11:17

source share

2 answers

Paul dixon · Answer 1 · 2009-10-24T11:22:43+0000

You can use Tor to mask your requests, but if you need to go this long to crawl a website, maybe you shouldn't do this?

In addition, with your approach, the iframe request will include your page URL as a referrer, which makes identifying these requests fairly unambiguously on the server ...

Martin v. Löwis · Answer 2 · 2009-10-24T11:22:38+0000

If this is a specific web side, I recommend talking to the website operators, rather than trying to anonymously scan.

How does a crowd fix my web crawl

More articles: