It is advisable to separate the vector elements of the text symbol in sentences. There is more than one splitting criteria template ( "and/ERT" , "/$" ). There are also exceptions to the patterns ( :/$. , and/ERT then , ./$. Smiley ).
Try it: compare the cases in which there should be a split. Insert an unusual pattern ( "^&*" ) in this place. strsplit concrete template
Problem: I do not know how to handle exceptions correctly. There are obvious cases where an unusual template ( "^&*" ) must be deleted and the source text restored before running strsplit .
code:
text <- c("This are faulty propositions one and/ERT two ,/$, which I want to split ./$. There are cases where I explicitly want and/ERT some where I don't want to split ./$. For example :/$. when there is an and/ERT then I don't want to split ./$. This is also one case where I dont't want to split ./$. Smiley !/$. Thank you ./$!", "This are the same faulty propositions one and/ERT two ,/$, which I want to split ./$. There are cases where I explicitly want and/ERT some where I don't want to split ./$. For example :/$. when there is an and/ERT then I don't want to split ./$. This is also one case where I dont't want to split ./$. Smiley !/$. Thank you ./$!", "Like above the same faulty propositions one and/ERT two ,/$, which I want to split ./$. There are cases where I explicitly want and/ERT some where I don't want to split ./$. For example :/$. when there is an and/ERT then I don't want to split ./$. This is also one case where I dont't want to split ./$. Smiley !/$. Thank you ./$!") patternSplit <- c("and/ERT", "/\\$") # The class of split-cases is much larger then in this example. Therefore it is not possible to adress them explicitly. patternSplit <- paste("(", paste(patternSplit, collapse = "|"), ")", sep = "") exceptionsSplit <- c("\\:/\\$\\.", "and/ERT then", "\\./\\$\\. Smiley") exceptionsSplit <- paste("(", paste(exceptionsSplit, collapse = "|"), ")", sep = "") # If you don't have exceptions, it works here. Unfortunately it splits "*$/*" into "*" and "$/*". Would be convenient to avoid this. See example "ideal" split below. textsplitted <- strsplit(gsub(patternSplit, "^&*\\1", text), "^&*", fixed = TRUE) # # Ideal split: textsplitted > textsplitted [[1]] [1] "This are faulty propositions one and/ERT" [2] "two ,/$," [3] "which I want to split ./$." [4] "There are cases where I explicitly want and/ERT" [5] "some where I don't want to split ./$." [6] "For example :/$. when there is an and/ERT then I don't want to split ./$." [7] "This is also one case where I dont't want to split ./$. Smiley !/$." [8] "Thank you ./$!" [[2]] [1] "This are the same faulty propositions one and/ERT [2] "two ,/$," #... # This try doesen't work! text <- gsub(patternSplit, "^&*\\1", text) text <- gsub(exceptionsSplit, "[original text without "^&*"]", text) textsplitted <- strsplit(text, "^&*", fixed = TRUE)