Downloading files from http server in python

Using urllib2, we can get an HTTP response from a web server. If this server simply contains a list of files, we can parse the files and upload them individually. However, I'm not sure if the easiest, most pythonic way to parse files.

When you get the full HTTP response of a list of shared file servers using the urllib2 urlopen () method, how can we carefully download each file?

+3
source share
7 answers

Urllib2 may be ok to get a list of files. For downloading large volumes of PycURL binaries http://pycurl.sourceforge.net/ is the best choice. This works for my IIS based file server:

import re
import urllib2
import pycurl

url = "http://server.domain/"
path = "path/"
pattern = '<A HREF="/%s.*?">(.*?)</A>' % path

response = urllib2.urlopen(url+path).read()

for filename in re.findall(pattern, response):
    fp = open(filename, "wb")
    curl = pycurl.Curl()
    curl.setopt(pycurl.URL, url+path+filename)
    curl.setopt(pycurl.WRITEDATA, fp)
    curl.perform()
    curl.close()
    fp.close()
+9

urllib.urlretrieve( Python 3.x: urllib.request.urlretrieve):

import urllib
urllib.urlretrieve('http://site.com/', filename='filez.txt')

:)

, ( urllib):

def download(url):
    webFile = urllib.urlopen(url)
    localFile = open(url.split('/')[-1], 'w')
    localFile.write(webFile.read())
    webFile.close()
    localFile.close()
+5

, URL-, , ? , ?

, lxml , , .

+3

:

import urllib2

response = urllib2.urlopen('http://server.com/file.txt')
urls = response.read().replace('\r', '').split('\n')

for file in urls:
  print 'Downloading ' + file

  response = urllib2.urlopen(file)

  handle = open(file, 'w')
  handle.write(response.read())
  handle.close()

, , , . , . !

+2
+2

BeautifulSoup ( HTML/XML), . pycURL .

, , , urllib.urlretrieve , wget, .

+2
source

This is an invalid method, but although it works

fPointer = open(picName, 'wb')
self.curl.setopt(self.curl.WRITEFUNCTION, fPointer.write) 


urllib.urlretrieve(link, picName) - correct way
+2
source

Source: https://habr.com/ru/post/1783310/


All Articles