How to check if html document contains script tags that are not empty using regex

I am trying to check if an html document contains script tags that are not empty using regular expressions. The regular expression must match any script tag with content other than spaces or lines.

I tried

<script\b[^>]*>[^.+$]</script>

but this regex only finds script tags with one space.

+3
source share
5 answers

HTML regexen! , . ? HTML, , , . , JavaScript DOM, :

var scripts     = document.getElementsByTagName('script')
var numScripts  = scripts.length
var textScripts = []
for (var i = 0; i < numScripts; ++i)
  if (scripts[i].text !== '') textScripts.push(scripts[i])

HTML, script, .


1: -, Java. , HTML Java, ; , , .

+7

Regex . HTML-. Jsoup .

:

URL url = new URL("http://stackoverflow.com/questions/2993515");
Document document = Jsoup.parse(url, 3000);

Elements scripts = document.select("script");
for (Element script : scripts) {
    String data = script.data();
    if (!data.isEmpty()) {
        System.out.println(data);
    }
}

Jsoup HTML, API jQuery .

+4

script, , script, , script , .

, , .

, script.

+2
source

Use TagSoup or another Java DOM parser to find out.

Under no circumstances should regular expressions be used to parse HTML.

0
source

Source: https://habr.com/ru/post/1748931/


All Articles