What is a regular expression that properly separates SVG 'd' attributes in tokens?

I am trying to split the d attribute into a path tag in the svg file in tokens.

This is relatively easy:

 d = "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7" tokens = d.split(/[\s,]/) 

But this is also a valid d attribute:

 d = "M2-12C5,15,21,19,27-2C17,12-3,40,5,7" 

Complex parts are letters and numbers that are no longer separated, and negative numbers use only the negative sign as a separator. How to create a regex that handles this?

The rules look like this:

  • splits wherever there is a space or a comma
  • splitting numbers from letters (and save "-" with numbers)

I know I can use lookaround, for example:

 tokens = pathdef.split(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/) 

I have a problem with the formation of one regular expression, which also splits on the minus signs and saves the minus sign with numbers.

The above code should be labeled as follows:

 [ 'M', '2', '-12', 'C', '5', '15', '21', '19', '27', '-2', 'C', '17', '12', '-3', '40', '5', '7' ] 
+5
source share
2 answers

Brief

Unfortunately, JavaScript does not allow lookbehinds , so your options are quite limited, and the regular expression in the Other Regex Engines modules section below will not work for you (although this will be with some other regular expression engines).

Other regex engines

Note The regular expression in this section (Other regular expression engines) will not work in Javascript. See JavaScript Solution in the Code section.

I think with your original regular expression you tried to get to:

 [, ]|(?<![, ])(?=-|(?<=[az])\d|(?<=\d)[az]) 

This regular expression breaks these matches ( , or , or locations followed by - , or places where the letter precedes the number or places where the number precedes the letter).


code

 var a = [ "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7", "M2-12C5,15,21,19,27-2C17,12-3,40,5,7" ] var r = /-?(?:\d*\.)?\d+|[az]/gi a.forEach(function(s){ console.log(s.match(r)); }); 

Description

  • -?\d+(?:\.\d+)?|[az] Matches any of the following
    • -?\d+(?:\.\d+)?
      • -? Matches - literally zero or once
        • (?:\d*\.)? Align the next zero or once
          • \d* Matches any number of digits
          • \. Matches a literal point
      • \d+ Match one or more digits
    • [az] Matches any character in the range from az (any lowercase alpha character - since the i modifier is used, this also matches the uppercase letters of these letters)

Have I added (?:\d*\.)? because (as far as I know) you can have decimal numbers in the attributes of SVG d .

Note Changed the initial part of the regular expression from \d+(?:\.\d+)? before (?:\d*\.)?\d+ to catch numbers that donโ€™t have an integer part, such as .5 according to @Thomas (see comments below).

+4
source

You can go for

 -?\d+|[AZ] 

Watch the demo at regex101.com .


Here, instead of splitting, you can easily map them:
 matches = "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7".match(/-?\d+|[AZ]/g) # matches holds the different tokens 
+1
source

Source: https://habr.com/ru/post/1274052/


All Articles