How to get a file directory from a remote server?

Question

How to get a file directory from a remote server?

If I have a directory on a remote web server that allows browsing directories, how can I get all these files listed there from my other web server? I know that I can use urllib2.urlopen to get individual files, but how would I get a list of all the files in this remote directory?

+3

python file directory screen-scraping

Tom van enckevort Nov 09 '09 at 7:55

source share

2 answers

Why don't you use curl or wget to recursively load this page and limit it to level 1. You will save all problems with the script entry.

eg. sort of

wget -H -r --level=1 -k -p www.yourpage/dir

+2

Anurag uniyal Nov 09 '09 at 8:35

source share

Robert Christie · Accepted Answer · 2009-11-09T08:29:24+0000

If the web server has turned on directory browsing, it will return an HTML document with links to all files. You can parse the HTML document and extract all the links. This will give you a list of files.

You can use the HTMLParser class to retrieve the elements of interest to you. Something like this will work:

from HTMLParser import HTMLParser
import urllib

class AnchorParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
            if tag =='a':
                    for key, value in attrs.iteritems()):
                            if key == 'href':
                                    print value

parser = AnchorParser()
data = urllib.urlopen('http://somewhere').read()
parser.feed(data)

How to get a file directory from a remote server?

More articles: