Problems with your script as it stands:
url has a trailing / , which, if necessary, requests an invalid page, not a list of files that you want to download.- The CSS selector in
soup.select(...) selects a div with the webpartid attribute that does not exist anywhere in this linked document. - You join the URL and quote it, even if the links are listed on the page as absolute URLs and they do not need to be quoted.
- The
try:...except: block stops you when you see errors that occur while trying to download a file. Using an except block without a specific exception is bad practice and should be avoided.
A modified version of your code that will receive the correct files and try to download them looks like this:
from bs4 import BeautifulSoup
However, if you run this, you will notice that an urllib.error.HTTPError: HTTP Error 403: Forbidden exception is urllib.error.HTTPError: HTTP Error 403: Forbidden , although the file can be downloaded in the browser. At first I thought it was a referral check (to prevent hotlinking), however, if you look at the request in your browser (for example, the Chrome developer tools), you will notice that the initial http:// request is also blocked here, and then Chrome tries execute the https:// request for the same file.
In other words, the request must go through HTTPS to work (even though the URLs on the page indicate). To fix this, you will need to rewrite http: to https: before using the URL for the request. The following code will correctly change URLs and upload files. I also added a variable to indicate the output folder, which is added to the file name using os.path.join :
import os from bs4 import BeautifulSoup
source share