R: split only when the special condition of the regular expression does not match

How would you split up on each and/ERT only when it was not interrupted by "/ v" inside one word after in:

 text <- c("faulty and/ERT something/VBN and/ERT else/VHGB and/ERT as/VVFIN and/ERT not else/VHGB propositions one and/ERT two/CDF and/ERT three/ABC") # my try - !doesn't work > strsplit(text, "(?<=and/ERT)\\s(?!./V.)", perl=TRUE) ^^^^ # Exptected return [[1]] [1] "faulty and/ERT something/VBN and/ERT else/VHGB and/ERT as/VVFIN and/ERT" [2] "not else/VHGB propositions one and/ERT" [3] "two/CDF and/ERT" [4] "three/ABC" 
+4
source share
3 answers

Actually, you need to approach this differently:

 (?<=and/ERT)\\s(?!\\S+/V) ^^^^ 

You will need to use \\S+ because using .* Will prevent a match even if /V is two words ahead.

\\S+ matches spaces.

Finally, the final period can be safely ignored.

demo regex101

+3
source

Actually, you made a small small mistake, but it made everything not work:

 (?<=and/ERT)\\s(?![^\\s/]+/V) ^^^^^^^ match one or more characters that are not white space or forward slash / 

By the way, point . after /V not needed.

Edit: I made some changes according to @smerny comment and your editing.

+4
source

Try the following:

 (?<=and/ERT)\\s(?![a-zA-Z]+/V) 

The problem was that your /V suffered and was followed by one of everything, and your example had more than one character between your space and your /V

[a-zA-Z]+/V ensures that the only thing that exists between space and / V is a single letter word. I believe that this is your requirement, based on your description and the examples given.

Demo

+1
source

Source: https://habr.com/ru/post/953474/


All Articles