Using R in the “click” button to upload a file to a web page

I am trying to use this web page http://volcano.si.edu/search_eruption.cfm to clear the data. There are two pop-up windows that request data filters. I don’t need the filtered data, so I leave them blank and go to the next page by clicking on “Search for failures”.

However, I noticed that the summary table includes only a small number of columns (5 in total) compared to the total number of columns (24 in total) that it should have. However, all 24 columns will be available if you click the "Load Results in Excel" button and open the downloaded file. This is what I need.

So it looks like this has turned from a rattle exercise (using httr and rvest) into something more complicated. However, I am stumped about how to actually “click” on the “Load Results in Excel” button using R. I assume that I have to use RSelenium, but here is my code trying to use httr with POST in case if there is an easier way that any of you can find a good person. I also tried using gdata, data.table, XML, etc. To no avail, which may simply be the result of a user error.

In addition, it would be useful to know that the download button cannot be right-clicked to display the URL.

url <- "http://volcano.si.edu/search_eruption_results.cfm" searchcriteria <- list( eruption_category = "", country = "" ) mydata <- POST(url, body = "searchcriteria") 

Using the Inspector in my browser, I could see that there are two filters: "eruption_category" and "country", and both will be empty, since I do not need any filtered data.

Finally, it seems that the code above will force me to go to a page with a table with 5 columns. However, I still could not clear this table using rvest in the code below (using SelectorGadget to clear only one column). In the end, this part does not matter, because, as I said above, I need all 24 columns, not just these 5. But if you find any errors in what I did below, I would be grateful ,

 Eruptions <- mydata %>% read_html() %>% html_nodes(".td8") %>% html_text() Eruptions 

Thanks for any help you can provide.

+5
source share
1 answer

Just enter POST :

 library(httr) library(rvest) library(purrr) library(dplyr) POST("http://volcano.si.edu/search_eruption_results.cfm", body = list(bp = "", `eruption_category[]` = "", `country[]` = "", polygon = "", cp = "1"), encode = "form") -> res content(res, as="parsed") %>% html_nodes("div.DivTableSearch") %>% html_nodes("div.tr") %>% map(html_children) %>% map(html_text) %>% map(as.list) %>% map_df(setNames, c("volcano_name", "subregion", "eruption_type", "start_date", "max_vei", "X1")) %>% select(-X1) ## # A tibble: 750 × 5 ## volcano_name subregion eruption_type start_date ## <chr> <chr> <chr> <chr> ## 1 Chirinkotan Kuril Islands Confirmed Eruption 2016 Nov 29 ## 2 Zhupanovsky Kamchatka Peninsula Confirmed Eruption 2016 Nov 20 ## 3 Kerinci Sumatra Confirmed Eruption 2016 Nov 15 ## 4 Langila New Britain Confirmed Eruption 2016 Nov 3 ## 5 Cleveland Aleutian Islands Confirmed Eruption 2016 Oct 24 ## 6 Ebeko Kuril Islands Confirmed Eruption 2016 Oct 20 ## 7 Ulawun New Britain Confirmed Eruption 2016 Oct 11 ## 8 Karymsky Kamchatka Peninsula Confirmed Eruption 2016 Oct 5 ## 9 Ubinas Peru Confirmed Eruption 2016 Oct 2 ## 10 Rinjani Lesser Sunda Islands Confirmed Eruption 2016 Sep 27 ## # ... with 740 more rows, and 1 more variables: max_vei <chr> 

I suggested that the "Excel" part could be output, but if not:

 POST("http://volcano.si.edu/search_eruption_excel.cfm", body = list(`eruption_category[]` = "", `country[]` = ""), encode = "form", write_disk("eruptions.xls")) -> res 
+4
source

Source: https://habr.com/ru/post/1263939/


All Articles