Replace characters after grouping characters

I have a large csv with a text column with a maximum width of 200. In almost all cases, the data is fine. In some cases, the data is too long or not filled out properly, I would like to use a regular expression to find the last instance of a particular pair / character, and then delete everything after it.

for example, data:

df <- data.frame(ID = c("1","2","3"),
             text = c("A|explain what a is|12.2|Y|explain Y|2.36|",
                 "A|explain what a is|15.2|E|explain E|10.2|E|explain E but run out hal",
                 "D|explain what d is|0.48|Z|explain z but number 5 is present|"))

My particular character pair is any number followed by \

This would mean that Line 1 is beautiful, line 2 would have everything after deleting "10.2", and in line 3 it would be everything after deleting 0.48

I tried this regex:

df[,2] <- sub("([^0-9]+[^|]*$)", "", df[,2])

, , , , . ? regexer, .

, .

+4
1

sub, (.*), , , (\\.?), , | . (\\1).

sub('^(.*[0-9]+\\.?[0-9]+)\\|.*$', '\\1', df$text)
+3

Source: https://habr.com/ru/post/1619200/


All Articles