Get links on a website

How can I get links on a web page without loading it? (basically what I want is this: the user enters a URL, and I want to download all the available links inside that URL.) Can you tell me how to achieve this?

+3
source share
5 answers

Here is an example of Java code in particular:

import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;

import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;

public class Main {
  public static void main(String args[]) throws Exception {
    URL url = new URL(args[0]);
    Reader reader = new InputStreamReader((InputStream) url.getContent());
    System.out.println("<HTML><HEAD><TITLE>Links for " + args[0] + "</TITLE>");
    System.out.println("<BASE HREF=\"" + args[0] + "\"></HEAD>");
    System.out.println("<BODY>");
    new ParserDelegator().parse(reader, new LinkPage(), false);
    System.out.println("</BODY></HTML>");
  }
}

class LinkPage extends HTMLEditorKit.ParserCallback {

  public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
    if (t == HTML.Tag.A) {
      System.out.println("<A HREF=\"" + a.getAttribute(HTML.Attribute.HREF) + "\">"
          + a.getAttribute(HTML.Attribute.HREF) + "</A><BR>");
    }
  }

}
+2
source

You will need to load the page on your server and then find the links, preferably by uploading the document to the HTML / XML parser and going through this DOM. Then the server can send links to the client.

, Javascript- .

0

, . , , <a> .

XML , JDom Sax, java ( ) DOM javascript.


:

:

0

URLConnection, .

0
public void extract_link(String site)
{
    try {
        List<String> links = extractLinks(site);
        for (String link : links) {
            System.out.println(link);
        }

    } catch (Exception e) {
        System.out.println(e);
    }
}

This is a simple function to view all the links on a page. If you want to view the link in internal links, just call it recursively (but make sure that you give a limit according to your needs).

0
source

Source: https://habr.com/ru/post/1768267/


All Articles