Download all files from a folder on a website

My question is in R, how to upload all the files on a website? I know how to do this one by one, but not all at the same time. For example:

http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/

+4
source share
2 answers

I tested this on a small subset (3) of 56 files per page, and it works great.

## your base url
url <- "http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/"
## query the url to get all the file names ending in '.zip'
zips <- XML::getHTMLLinks(
    url, 
    xpQuery = "//a/@href['.zip'=substring(., string-length(.) - 3)]"
)
## create a new directory 'myzips' to hold the downloads
dir.create("myzips")
## save the current directory path for later
wd <- getwd()
## change working directory for the download
setwd("myzips")
## create all the new files
file.create(zips)
## download them all
lapply(paste0(url, zips), function(x) download.file(x, basename(x)))
## reset working directory to original
setwd(wd)

Now all zip files are in the directory myzipsand ready for further processing. Alternatively, lapply()you can also use a loop for().

## download them all
for(u in paste0(url, zips)) download.file(u, basename(u))

And, of course, the installation quiet = TRUEcan be enjoyable, as we upload 56 files.

+9
source

A slightly different approach.

library(rvest)
library(httr)
library(pbapply)
library(stringi)

URL <- "http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/"

pg <- read_html(URL)
zips <- grep("zip$", html_attr(html_nodes(pg, "a[href^='TAB']"), "href"), value=TRUE)

invisible(pbsapply(zips, function(zip_file) {
  GET(URL %s+% zip_file, write_disk(zip_file))
}))

"" (write_disk ).

.

+4

Source: https://habr.com/ru/post/1616440/


All Articles