How to combine '+ abc' but not '++ abc' without lookbehind?

In a sentence similar to:

Lorem ipsum + dolor ++ sit amet.

I would like to match +dolor , but not ++sit . I can do this with lookbehind, but since JavaScript does not support it, I'm struggling to build a template for it.

So far I have tried:

 (?:\+(.+?))(?=[\s\.!\!]) - but it matches both words (?:\+{1}(.+?))(?=[\s\.!\!]) - the same here - both words are matched 

and, to my surprise, it looks like this:

 (?=\s)(?:\+(.+?))(?=[\s\.!\!]) 

nothing matches. I thought I could trick him and use \s or later also ^ in front of the + sign, but it doesn't seem to work.


EDIT - reference information:

This is not necessarily part of the question, but sometimes itโ€™s good to know that all this is good in order to clarify some of your questions / comments with a short explanation:

  • any word in any order can be marked with either a + or ++
  • each word and its marking will be replaced by <span> later
  • cases like lorem + ipsum are considered invalid because it would be like splitting a word (ro + om) or spelling two words together as one word (myroom), so it should be fixed in any case (the pattern may match this but this is not a mistake), however it should at least correspond to normal cases, as in the example above
  • I use lookahead, for example (?=[\s\.!\!]) So that I can match words in any language not only \w characters
+6
source share
5 answers

One way would be to match one extra character and ignore it (by putting the appropriate part of the match in the capture group):

 (?:^|[^+])(\+[^\s+.!]+) 

However, this is destroyed if potential matches can be directly adjacent to each other.

Test it live at regex101.com .

Explanation:

 (?: # Match (but don't capture) ^ # the position at the start of the string | # or [^+] # any character except +. ) # End of group ( # Match (and capture in group 1) \+ # a + character [^\s+.!]+ # one or more characters except [+.!] or whitespace. ) # End of group 
+3
source
 \+\+|(\+\S+) 

Take Content From Capture Group 1. The regular expression uses the trick described in this answer .

Demo on regex101

 var re = /\+\+|(\+\S+)/g; var str = 'Lorem ipsum +dolor ++sit ame'; var m; var o = []; while ((m = re.exec(str)) != null) { if (m.index === re.lastIndex) { re.lastIndex++; } if (m[1] != null) { o.push(m[1]); } } 

If you have input like +++donor , use:

 \+\++|(\+\S+) 
+3
source

The following regular expression seems to work for me:

 var re = / (\+[a-zA-Z0-9]+)/ // Note the space after the '/' 

Demo

https://www.regex101.com/r/uQ3wE7/1

+1
source

I think this is what you need.

 (?:^|\s)(\+[^+\s.!]*)(?=[\s.!]) 
+1
source

Just try the following regex:

 (^|\s)\+\w+ 
0
source

Source: https://habr.com/ru/post/980985/


All Articles