Dividing lines with captive separator in R

I need to read a file with R where a variable number of columns is separated by | . However, if preceded by a \ , this should not be construed as a delimiter.

At first I thought that something like strsplit(x, "[^\\][|]") would work, but the problem here is that the character before each pipe is "consumed":

 > strsplit("word1|word2|word3\\|aha!|word4", "[^\\][|]") [[1]] [1] "word" "word" "word3\\|aha" "word4" 

Can anyone suggest a way to do this? Ideally, it should be vectorized because the files in question are very large.

+4
source share
2 answers

I believe this works; using Anirudh downvoted answer (not sure why downvote, it doesn't work, but the regex was correct)

 strsplit(x, "(?<!\\\\)[|]", perl=TRUE) ## > strsplit(x, "(?<!\\\\)[|]", perl=TRUE) ## [[1]] ## [1] "word1" "word2" "word3\\|aha!" "word4" 
+5
source

You need to use a zero-width statement (lookbehind)

 (?<!\\\\)[|] 
+4
source

Source: https://habr.com/ru/post/1487482/


All Articles