RegExp not working on reading HTML file

First of all, I know how most RegExp issues go away; and this is not one of the questions, please write my code.

My confusion is that my RegExp works with regexr and in chrome dev tools when polling document.body.textContent , but not in the HTML file after I read it in io.js.

io.js is version 1.5.1 running on windows 8

Why will it work in both of these places, but not in io.js? Don't I take into account what io.js does for reading files?

My RegExp should match " @{each ___->___} text and line breaks @{/each} " as in the link below, but instead it returns null

Here is what I am trying to use: http://regexr.com/3aldk

RegExp:

/@\{each ([a-zA-Z0-9->.]*)\}([\s\S]*)@\{\/each}/g

JS (example):

 fs.readFile('view.html', {encoding:'utf8'}, function(error, html) { console.log(html.match(myRegExp)); // null }); 

HTML:

 <!doctype html> <html> <head> <title>@{title}</title> </head> <body> <h1>@{foo.bar}</h1> <p> Lorem ipsum dolor sit amet, @{foo.baz.hoo} </p> @{each people->person} <div> <b>@{person.name}:</b> @{person.age} </div> @{/each} </body> </html> 

Did I miss something obvious, like a character present on the back, but not once serving?

+6
source share
1 answer

The problem here is the thin line between specification and implementations .

The ECMAscript 5.1 specification states that:

Symbol A - may be processed literally or may indicate a range. It is processed literally if it is the first or last character of ClassRanges, the start or end limit of a range specification, or immediately follows a range specification.

Regular-Expressions.info notes that:

Hyphens at other positions in character classes where they cannot form a range can be interpreted as literals or as errors. Regix fragrances are completely incompatible with this.

Conclusions:

A safe way to include the minus - minus character in a character class:

  • shielding (for example, [a-zA-Z0-9\->.] )
  • puts it as the first char. in the class (for example, [-.>a-zA-Z0-9] )
    • exception: in the negative class, it comes second, immediately after ^ (for example, [^-.>a-zA-Z0-9] )
  • placing the latter in the class (for example, [a-zA-Z0-9.>-] )

General coding guidelines suggest placing your ranges in the first place and ending the character class with a hyphen, which avoids ambiguity and makes it easier to read.


To summarize, your RegEx should become:

 /@\{each ([a-zA-Z0-9>.-]*)\}([\s\S]*)@\{\/each}/g 

As an additional tip:

you can also rewrite [\s\S] (any char spaces or any non-white char. spaces) to [^] (nothing)

which completes you with the following RegEx:

 /@\{each ([a-zA-Z0-9>.-]*)\}([^]*)@\{\/each}/g 

JavaScript ... treats [^] as a negative empty character class that matches any single character. - source

+6
source

Source: https://habr.com/ru/post/984107/


All Articles