No output for parsing google news content

Question

No output for parsing google news content

For my code here, I want to get a new header and google url.

He worked in the past. However, I do not know why it is not working now?

has Google changed its CSS structure or what?

thanks

public static void main(String[] args) throws UnsupportedEncodingException, IOException { String google = "http://www.google.com/search?q="; String search = "stackoverflow"; String charset = "UTF-8"; String news="&tbm=nws"; String userAgent = "ExampleBot 1.0 (+http://example.com/bot)"; // Change this to your company name and bot homepage! Elements links = Jsoup.connect(google + URLEncoder.encode(search , charset) + news).userAgent(userAgent).get().select( ".g>.r>.a"); for (Element link : links) { String title = link.text(); String url = link.absUrl("href"); // Google returns URLs in format "http://www.google.com/url?q=<url>&sa=U&ei=<someKey>". url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8"); if (!url.startsWith("http")) { continue; // Ads/news/etc. } System.out.println("Title: " + title); System.out.println("URL: " + url); } }

+5

java google-search parsing google-search-api jsoup

evabb Jan 11 '17 at 4:51

source share

2 answers

Programmers block · Answer 1 · 2017-01-14T05:04:02+0000

If the question arises: "How do I get the code to work again?" It would be difficult for everyone to find out what the old page looked like if they did not keep a copy.

I broke your choice, like this one, and it worked for me.

  String string = google + URLEncoder.encode(search , charset) + news; Document document = Jsoup.connect(string).userAgent(userAgent).get(); Elements links = document.select( ".r>a");

The current page source looks like

  <div class="g"> <table> <tbody> <tr> <td valign="top" style="width:516px"><h3 class="r"><a href="/url?q=https://www.bleepingcomputer.com/news/security/marlboro-ransomware-defeated-in-one-day/&amp;sa=U&amp;ved=0ahUKEwis77iq7cDRAhXI7IMKHUAoDs0QqQIIFCgAMAE&amp;usg=AFQjCNFFx-sJdU814auBfquRYSsct2c8WA">Marlboro Ransomware Defeated in One Day</a></h3>

Results: Title: Marlboro Ransomware won in one day URL: https://www.bleepingcomputer.com/news/security/marlboro-ransomware-defeated-in-one-day/

Title: Qaru offers new opportunities for developers URL: https://techcrunch.com/2016/10/11/stack-overflow-puts-a-new-spin-on-resumes-for-developers/

Edited - Time Range These URL options look awful.
Add suffix & tbs = cdr% 3A1% 2Ccd_min% 3A5% 2F30% 2F2016% 2Ccd_max% 3A6% 2F30% 2F2016

But this part of "min% 3A5% 2F30% 2F2016" contains your minimum date. 5 30 2016. min% 3A + (month of the year) +% 2F + (day of the month) +% 2F + year And in "max% 3A6% 2F30% 2F2016" is your maximum date. 6 30 2016. max% 3A + (month of the year) +% 2F + (day of the month) +% 2F + year

Here is the full Mindy Kaling search URL between 05/30/2016 and 06/30/2016 https://www.google.com/search?tbm=nws&q=mindy%20kaling&tbs=cdr%3A1%2Ccd_min%3A5%2F30%2F2016% 2Ccd_max% 3A6% 2F30% 2F2016

Pavan kumar · Answer 2 · 2017-01-20T12:45:53+0000

Below worked for me. Pay attention to the template ".g .r>a" - find the elements with class g →> all the elements inside this class with class r , which immediately goes down with the tag a

 Elements links = Jsoup.connect(google + URLEncoder.encode(search , charset) + news) .userAgent(userAgent).get().select( ".g .r>a");

From the documentation :

.class : find elements by class name .class
ancestor child : children that descend from the ancestor, for example ..body p finds p elements anywhere under the block with the body class
parent > child : children that descend directly from the parent, e.g. div.content> p finds p elements; and body> * finds direct children of the body tag

Although the solution worked, I believe that relying on the same may not be recommended unless it is intended for study or temporary use. Delivery of this part of the product may crash at any time when Google changes the rendering of its page.

No output for parsing google news content

More articles: