Web scraper in R with loop from data.frame

library(rvest)

df <- data.frame(Links = c("Qmobile_Noir-M6", "Qmobile_Noir-A1", "Qmobile_Noir-E8"))

for(i in 1:3) {
  webpage <- read_html(paste0("https://www.whatmobile.com.pk/", df$Links[i]))
  data <- webpage %>%
    html_nodes(".specs") %>%
    .[[1]] %>% 
    html_table(fill = TRUE)
}

I want the loop to work for all 3 values ​​in df$Links, but just load the last one on the code, and the downloaded data should also be identical with the variables (maybe a new column with the name of the variables)

0
source share
2 answers

The problem is how you structure the loop for. It is much easier not to use it in the first place, since R has excellent support for repeating lists like lapplyand purrr::map. One version of how you could structure your data:

library(tidyverse)
library(rvest)

base_url <- "https://www.whatmobile.com.pk/"

models <- data_frame(model = c("Qmobile_Noir-M6", "Qmobile_Noir-A1", "Qmobile_Noir-E8"),
           link = paste0(base_url, model),
           page = map(link, read_html))

model_specs <- models %>% 
    mutate(node = map(page, html_node, '.specs'),
           specs = map(node, html_table, header = TRUE, fill = TRUE),
           specs = map(specs, set_names, c('var1', 'var2', 'val1', 'val2'))) %>% 
    select(model, specs) %>% 
    unnest()

model_specs
#> # A tibble: 119 x 5
#>              model      var1       var2
#>              <chr>     <chr>      <chr>
#>  1 Qmobile_Noir-M6     Build         OS
#>  2 Qmobile_Noir-M6     Build Dimensions
#>  3 Qmobile_Noir-M6     Build     Weight
#>  4 Qmobile_Noir-M6     Build        SIM
#>  5 Qmobile_Noir-M6     Build     Colors
#>  6 Qmobile_Noir-M6 Frequency    2G Band
#>  7 Qmobile_Noir-M6 Frequency    3G Band
#>  8 Qmobile_Noir-M6 Frequency    4G Band
#>  9 Qmobile_Noir-M6 Processor        CPU
#> 10 Qmobile_Noir-M6 Processor    Chipset
#> # ... with 109 more rows, and 2 more variables: val1 <chr>, val2 <chr>

The data is still pretty dirty, but at least it's all there.

+1

, .

, , , . -

final_table <- list()

for(i in 1:3) {
   webpage <- read_html(paste0("https://www.whatmobile.com.pk/",   df$Links[i]))
   data <- webpage %>%
   html_nodes(".specs") %>%
   .[[1]] %>% 
html_table(fill= TRUE)

 final_table[[i]] <- data.frame(data, stringsAsFactors = F)
}

.

+1

Source: https://habr.com/ru/post/1616307/


All Articles