How to get Google search results

I used the following code:

library(XML) library(RCurl) getGoogleURL <- function(search.term, domain = '.co.uk', quotes=TRUE) { search.term <- gsub(' ', '%20', search.term) if(quotes) search.term <- paste('%22', search.term, '%22', sep='') getGoogleURL <- paste('http://www.google', domain, '/search?q=', search.term, sep='') } getGoogleLinks <- function(google.url) { doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)")) html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){}) nodes <- getNodeSet(html, "//a[@href][@class='l']") return(sapply(nodes, function(x) x <- xmlAttrs(x)[[1]])) } search.term <- "cran" quotes <- "FALSE" search.url <- getGoogleURL(search.term=search.term, quotes=quotes) links <- getGoogleLinks(search.url) 

I would like to find all the links that were received as a result of my search, and I get the following result:

 > links list() 

How can I get links? In addition, I would like to get headlines and summaries of Google results, how can I get them? And finally, is there a way to get the links that are in the results of ChillingEffects.org?

+5
source share
2 answers

If you look at the html variable, you will see that the links to the search results are all nested within the <h3 class="r"> tags.

Try changing the getGoogleLinks function to:

 getGoogleLinks <- function(google.url) { doc <- getURL(google.url, httpheader = c("User-Agent" = "R (2.10.0)")) html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function (...){}) nodes <- getNodeSet(html, "//h3[@class='r']//a") return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]])) } 
+6
source

I created this function to read in the list of company names, and then I get the result of the top site for each. This will help you get started, after which you can adjust it as needed.

 #libraries. library(URLencode) library(rvest) #load data d <-read.csv("P:\\needWebsites.csv") c <- as.character(d$Company.Name) # Function for getting website. getWebsite <- function(name) { url = URLencode(paste0("https://www.google.com/search?q=",name)) page <- read_html(url) results <- page %>% html_nodes("cite") %>% # Get all notes of type cite. You can change this to grab other node types. html_text() result <- results[1] return(as.character(result)) # Return results if you want to see them all. } # Apply the function to a list of company names. websites <- data.frame(Website = sapply(c,getWebsite))] 
+3
source

Source: https://habr.com/ru/post/1232720/


All Articles