How can I write a Java regular expression that receives the contents of the <script> tag?

I am trying to integrate analytics into my GWT application. To do this, I call a service that returns an HTML string to be parsed and parsed.

I need a regular expression that searches and captures either 1) the body of the tag, or 2) the contents of the "src" attribute. I want to get around both of them using JavaScript. I am pleased with the assumption that if the "src" attribute exists, the body can be ignored.

Thanks,

Matt

+3
source share
6 answers

This is similar to what you want:

    final String srcOne = "<html>\r\n<head>\r\n<script src=\"http://test.com/some.js\"/>\r\n</head></html>";
    final String srcTwo = "<html>\r\n<head>\r\n<script src=\"http://test.com/some.js\"></script>\r\n</head></html>";
    final String tag = "<html>\r\n<head>\r\n<script>\r\nfunction() {\r\n\talert('hi');\r\n}\r\n</script>\r\n</head></html>";
    final String tagAndSrc = "<html>\r\n<head>\r\n<script src=\"http://test.com/some.js\">\r\nfunction() {\r\n\talert('hi');\r\n}\r\n</script>\r\n</head></html>";
    final String[] tests = new String[] {srcOne, srcTwo, tag, tagAndSrc, srcOne + srcTwo, tag + srcOne + tagAndSrc};

    final String regex = "<script(?:[^>]*src=['\"]([^'\"]*)['\"][^>]*>|[^>]*>([^<]*)</script>)";
    final Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
    for (int testNumber = 0; testNumber < tests.length; ++testNumber) {
        final String test = tests[testNumber];
        final Matcher matcher = pattern.matcher(test);
        System.out.println("--------------------------------");
        System.out.println("TEST " + testNumber + ": " + test);
        while (matcher.find()) {
            System.out.println("GROUP 1: " + matcher.group(1));
            System.out.println("GROUP 2: " + matcher.group(2));
        }
        System.out.println("--------------------------------");
        System.out.println();
    }

, , , - Tag Soup, .

+1

, ? DOM , BODY, , :

function test(){
    var body = document.getElementsByTagName("body")[0];
    alert(body.innerHTML);
}
+6

- :

String ScriptPattern = "<script\b([^>]+)>(.*?)</script>"    
Pattern ScriptRegex = Pattern.compile(ScriptPattern, Pattern.CASE_INSENSITIVE);

>

, , , .

$1 script, $2 - . src $1 ( ).

+1

, -

<script[^>]*?>(.*?)</script>

. , script ", " > ". regexp, . . *? , .

src,

<script[^>]*?(src="([^"]*)")?[^>]*?>(.*?)</script>

, 'src', - . , .

, HTML/XML/SGML, .

0

<script>(.*)</script>|<script src="(.*)">.*</script>

. ,

  • src .
  • '< script' ' > '

DOTALL, . .

0

. , Java Regex API GWT , JSNI.

public static native String evalJS(Element e) /*-{
    var scripts = e.getElementsByTagName("script");

    for (i=0; i < scripts.length; i++) {
        // if src, eval it, otherwise eval the body
        if (scripts[i].hasAttribute("src")) {
            eval(scripts[i].getAttribute("src")); // silently fails here
        } else {
            eval(scripts[i].innerHTML); // this works
        }
    }
}-*/; 

, , :

http://groups.google.com/group/Google-Web-Toolkit/browse_thread/thread/ac2589369ddec8a3

0

Source: https://habr.com/ru/post/1703123/


All Articles