Selenium pdf automatic download not working

I am new to selenium and I am writing a scraper to automatically download PDF files from this site.

Below is my code:

from selenium import webdriver fp = webdriver.FirefoxProfile() fp.set_preference("browser.download.folderList",2); fp.set_preference("browser.download.manager.showWhenStarting",False) fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar") fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf") browser = webdriver.Firefox(firefox_profile=fp) browser.get("http://epaper.dinamalar.com/PUBLICATIONS/DM/MADHURAI/2015/05/26/PagePrint//26_05_2015_001_b2b69fda315301809dda359a6d3d9689.pdf"); webobj = browser.find_element_by_id("download").click(); 

I followed the steps mentioned in Selenium documentation and in this link . I do not know why the download dialog is displayed every time.

In any case, to fix this, could there be a way to provide "application / all" so that all files can be downloaded (crawl)?

+6
source share
3 answers

Disable the built-in pdfjs plugin and go to the URL - the PDF file will be downloaded automatically, code:

 from selenium import webdriver fp = webdriver.FirefoxProfile() fp.set_preference("browser.download.folderList", 2) fp.set_preference("browser.download.manager.showWhenStarting",False) fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar") fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf,application/x-pdf") fp.set_preference("pdfjs.disabled", "true") # < KEY PART HERE browser = webdriver.Firefox(firefox_profile=fp) browser.get("http://epaper.dinamalar.com/PUBLICATIONS/DM/MADHURAI/2015/05/26/PagePrint//26_05_2015_001_b2b69fda315301809dda359a6d3d9689.pdf"); 

Update (full code that worked for me):

 from selenium import webdriver mime_types = "application/pdf,application/vnd.adobe.xfdf,application/vnd.fdf,application/vnd.adobe.xdp+xml" fp = webdriver.FirefoxProfile() fp.set_preference("browser.download.folderList", 2) fp.set_preference("browser.download.manager.showWhenStarting", False) fp.set_preference("browser.download.dir", "/home/aafanasiev/Downloads") fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types) fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types) fp.set_preference("pdfjs.disabled", True) browser = webdriver.Firefox(firefox_profile=fp) browser.get("http://epaper.dinamalar.com/") webobj_get_link = browser.find_element_by_id("liSavePdf") webobj_get_object = webobj_get_link.find_element_by_tag_name("a") webobj_get_object.click() 
+6
source

Since there is no HTML code, I assume this line

 webobj = browser.find_element_by_id("download").click(); 

actually onclick event, but you are not handling it properly. In other words, you do not have enough space to store this .pdf file. I have very little python programming experience, but one solution might be to use HTTP webclient lib, which will allow you to automatically upload files. Something like CSharp WebClient.DownloadFile Method (String, String) . And if they are used correctly, you can skip any Selenium commands for this action.

Perhaps something like this post would be a good start.

0
source

I tested the following code and I successfully uploaded your pdf file in Windows 7:

 fp = webdriver.FirefoxProfile() fp.set_preference("browser.download.folderList", 2) fp.set_preference("browser.download.manager.showWhenStarting", False) fp.set_preference("browser.download.dir", download_location) fp.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf") fp.set_preference("pdfjs.disabled", True) fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf") driver = webdriver.Firefox(fp) driver.implicitly_wait(10) driver.maximize_window() driver.get("http://epaper.dinamalar.com/") element = driver.find_element_by_css_selector("li#liSavePdf>a>img") element.click() 
0
source

Source: https://habr.com/ru/post/987910/


All Articles