How to get a file directory from a remote server?

If I have a directory on a remote web server that allows browsing directories, how can I get all these files listed there from my other web server? I know that I can use urllib2.urlopen to get individual files, but how would I get a list of all the files in this remote directory?

+3
source share
2 answers

If the web server has turned on directory browsing, it will return an HTML document with links to all files. You can parse the HTML document and extract all the links. This will give you a list of files.

You can use the HTMLParser class to retrieve the elements of interest to you. Something like this will work:

from HTMLParser import HTMLParser
import urllib

class AnchorParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
            if tag =='a':
                    for key, value in attrs.iteritems()):
                            if key == 'href':
                                    print value

parser = AnchorParser()
data = urllib.urlopen('http://somewhere').read()
parser.feed(data)
+6
source

Why don't you use curl or wget to recursively load this page and limit it to level 1. You will save all problems with the script entry.

eg. sort of

wget -H -r --level=1 -k -p www.yourpage/dir
+2
source

Source: https://habr.com/ru/post/1722300/


All Articles