What is the meaning of the character encoding

I am reading the Sizzle source code. I am confused when I read regular about character coding. In source code, the Encoding character is defined below:

characterEncoding = "(?:\\\\.|[\\w-]|[^\\x00-\\xa0])+" 

He seems to be trying to combine \\. or \ w- or ^ \ x00- \ xa0. I know that [\ w-] means \ or w or -, and I also know that [^ \ x00- \ xa0] means something other than \ x00- \ x20. Who can tell me what it means about \\. and \ x00- \ x20.

thanks


I think I know what it is. The characterEncoding type is a string. Therefore, if we assign, as shown below:

 characterEncoding = "(?:\\\\.|[\\w-]|[^\\x00-\\xa0])+" 

Character Encoding Value:

 (?:\\.|[\w-]|[^\x00-\xa0])+ 

So, if I build a regex like above, it means:

 [\w-] // A symbol of Latin alphabet or a digit or an underscore '_' or '-' [^\\x00-\\xa0] // ISO 10646 characters U+00A1 and higher \\. // '\' and '.' 

So, this time my question is: when will the \\. pattern work \\. ?

+4
source share
2 answers

The variable will be better named css3Identifier or something else.

Convert [\w-]|[^\x00-\xa0] into an equivalent form that best fits the specification:

 [a-zA-Z0-9_-]|[\u00A1-\uFFFF] 

We believe that A1 is 161 , _ is the underscore, and - is a dash, and then read this :

In CSS3, identifiers (including element names, classes, and identifiers in selectors (see [SELECT] [or this is still true])) can contain only the characters [A-Za-z0-9] and ISO 10646 161 and higher , plus a hyphen (-) and underscore (_)

"and above" is covered -\uFFFF


"\\\\." matches any single character preceded by a backslash. for example, \7B will match \7 , and then there will be B in the middle alternative. It also matches \n , \r , \t etc.

+4
source

This is just the correct regular expression format for the CSS identifier, class, tag, and attributes. The link is also contained in the source code comment. The following are the rules, including the possible use of backslashes that might answer your question:

4.1. Symbols and Case

The following rules are always saved:

  • All CSS stylesheets are case-insensitive, with the exception of parts that are not under CSS control. For example, the case sensitivity of the HTML id and class attribute attributes, font names, and URIs is not beyond the scope of this specification. Note in particular that element names are case-insensitive in HTML, but case-sensitive in XML.

  • In CSS3, identifiers (including element names, classes, and identifiers in selectors (see [SELECT] [or this is still true])) can only contain the characters [A-Za-z0-9] and ISO 10646 characters 161 and above, plus hyphen (-) and underscore (_); they cannot begin with a digit or hyphen followed by a digit. They can also contain escaped characters and any ISO 10646 character as a numeric code (see next paragraph). For example, the identifier "B & W?" can be written as "B \ & W \?" or "B \ 26 W \ 3F". (See [UNICODE310] and [ISO10646].)

  • In CSS3, the backslash character () indicates three types of character screens.

    First, inside a line (see [CSS3VAL]), a backslash followed by a new line is ignored (that is, it is believed that the line does not contain a backslash or new line).

    Secondly, it overrides the meaning of CSS special characters. Any character (except for a hexadecimal digit) can be escaped with a backslash to remove its special meaning. For example, "\" "is a single double-quoted string. Style sheet preprocessors should not remove these backslashes from the style sheet, as this will change the value of the style sheet.

    Third, an escape backslash allows authors to refer to characters that they cannot easily fit into a stylesheet. In this case, the backslash is followed by six hexadecimal digits (0..9A..F) that denote the ISO 10646 symbol ([ISO10646]) with this number. If a digit or letter follows a hexadecimal number, the end of the number must be cleared. There are two ways to do this:

    • with a space (or other space character): "\ 26 B" ("& B"). In this case, user agents must process the CR / LF (13/10) pair as a single space character.
    • providing exactly 6 hexadecimal digits: "\ 000026B" ("& B")

    In fact, these two methods can be combined. After a hexadecimal escape code, only one space character is ignored. Note that this means that the "real" space after the escape sequence must either be escaped or doubled.

  • Backslash screens are always considered part of the identifier or string (i.e., "\ 7B" is not punctuation, even if "{" is and "\ 32" are allowed at the beginning of the class, although "2" is not).

http://www.w3.org/TR/css3-syntax/#characters

+1
source

Source: https://habr.com/ru/post/1490324/


All Articles