Crawler MP3 Link

Question

Crawler MP3 Link

I was looking for a good way to implement this. I am working on a simple website crawler that will go around a specific set of websites and crawl all mp3 links into a database. I don’t want to upload files, just scan the link, index them and search. So far, for some sites I have been successful, but for some they use URL redirects and stuff that confuses the crawler ..

any ideas? How does beemp3.com index all of these links?

thanks

+3

web crawler mp3

John stewart Jul 14 '09 at 16:15

source share

3 answers

Federico klez culloca · Answer 1 · 2009-07-14T16:58:39+0000

You can query the http header for links and check their mime type. If there is a chance of audio / mpeg, you select the mp3 link.

Daniel F. Thornton · Answer 2 · 2009-07-14T17:02:40+0000

- ( ). QUERY_TEXT Google :

QUERY_TEXT intitle:
"index.of" "parent directory" "size" "last modified" "description"
[snd] (mp4|mp3|avi)
-inurl:(jsp|php|html|aspx|htm|cf|shtml|lyrics|mp3s|mp3|index)
-gallery
-intitle:"last modified"
-intitle:(intitle|mp3)

hannson · Answer 3 · 2009-07-23T16:28:19+0000

?

Python:
, Scrapy ( python), Django Framework. , , Scrapy - . IIRC , DRY ( , Django ).

URL, .

.

Perhaps you can edit your question and add information about your crawler; Is it written from scratch, is it some kind of turnkey solution, etc.?

Crawler MP3 Link

More articles: