I am trying to automatically download documents for oil and gas wells from the Colorado Oil and Gas Conservation Commission (COGCC) using the "rvest" and "downloader" packages in R.
Link to a table / form that contains documents for a particular well;
http://ogccweblink.state.co.us/results.aspx?id=12337064
"id = 12337064" is a unique identifier for the well
Documents on the form page can be downloaded by clicking on them. The following is an example.
http://ogccweblink.state.co.us/DownloadDocument.aspx?DocumentId=3172781
"DocumentID = 3172781" is the unique document identifier for the uploaded document. In this case, the xlsm file. Other file formats on the document page include PDF and xls.
So far, I could write code to download any document for any well, but it is limited only to the first page. Most wells have documents on multiple pages, and I cannot upload documents to pages other than page 1 (all document pages have a similar URL)
library(rvest)
html <- html("http://ogccweblink.state.co.us/results.aspx?id=12337064")
File <- html_nodes(html, "tr:nth-child(24) td:nth-child(4) a")
File <- as(File[[1]],'character')
DocId<-gsub('[^0-9]','',File)
DocId
[1] "3172781"
library(downloader)
linkDocId<-paste('http://ogccweblink.state.co.us/DownloadDocument.aspx DocumentId=',DocId,sep='')
download(linkDocId,"DIRECTIONAL DATA" ,mode='wb')
trying URL 'http://ogccweblink.state.co.us/DownloadDocument.aspx?DocumentId=3172781'
Content type 'application/octet-stream' length 33800 bytes (33 KB)
downloaded 33 KB
Does anyone know how I can change my code to upload documents to other pages?
Many thanks!
Em
source
share