I have a large csv with a text column with a maximum width of 200. In almost all cases, the data is fine. In some cases, the data is too long or not filled out properly, I would like to use a regular expression to find the last instance of a particular pair / character, and then delete everything after it.
for example, data:
df <- data.frame(ID = c("1","2","3"),
text = c("A|explain what a is|12.2|Y|explain Y|2.36|",
"A|explain what a is|15.2|E|explain E|10.2|E|explain E but run out hal",
"D|explain what d is|0.48|Z|explain z but number 5 is present|"))
My particular character pair is any number followed by \
This would mean that Line 1 is beautiful, line 2 would have everything after deleting "10.2", and in line 3 it would be everything after deleting 0.48
I tried this regex:
df[,2] <- sub("([^0-9]+[^|]*$)", "", df[,2])
, , , , . ? regexer, .
, .