How to get web content before visiting this web page

how to get the description / content of a webpage for a given url. (Something like Google gives a brief description of each resulting link). I want to do this on my jsp page.

Thanks in advance!

+3
source share
1 answer

Idea: Open the URL as a stream, and then HTML parsing the string in the description meta tag.

Get URL:

URL url = new URL("http://www.url-to-be-parsed.com/page.html");
    BufferedReader in = new BufferedReader(
                new InputStreamReader(
                url.openStream()));

You will need to customize the above code depending on what your HTML parser library requires (stream, lines, etc.).

HTML tag analysis:

<meta name="description" content="This is a place where webmasters can put a description about this web page" />

You may also be interested in capturing the title of this page:

<title>This is the title of the page!</title>

: HTML , HTML-.

HTML-:

  • HasAttributeFilter name="description"
  • Node --- > MetaTag
  • content MetaTag.getAttribute()

:

import org.htmlparser.Node;
import org.htmlparser.Parser;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
import org.htmlparser.filters.HasAttributeFilter;
import org.htmlparser.tags.MetaTag;

public class HTMLParserTest {
    public static void main(String... args) {
        Parser parser = new Parser();
        //<meta name="description" content="Some texte about the site." />
        HasAttributeFilter filter = new HasAttributeFilter("name", "description");
        try {
            parser.setResource("http://www.youtube.com");
            NodeList list = parser.parse(filter);
            Node node = list.elementAt(0);

            if (node instanceof MetaTag) {
                MetaTag meta = (MetaTag) node;
                String description = meta.getAttribute("content");

                System.out.println(description);
                // Prints: "YouTube is a place to discover, watch, upload and share videos."
            }

        } catch (ParserException e) {
            e.printStackTrace();
        }
    }

}

:

JSP , , - - URL-. , , URL-, - n URL-. , , JSP.

+4

Source: https://habr.com/ru/post/1752391/


All Articles