At least one way:
gsub(".*?:\\s*(.*?)\\.", "\\1, ", sentence) [1] "avobenzone, octocrylene, octyl salicylate, water, glycerin, edta, "
Pay attention to? after. * This makes the match not greedy. No match ?,. * As much as possible.
Adding
The idea behind this is to replace everything except the part you want with nothing. You said that you wanted to stop at punctuation marks, but you obviously did not want to stop at commas, so I let you interpret the problem of how to find parts of the sting between the colon and period. In my expression .*?: matches all up to the first colon. I insert \\ s * to strip out any spaces that may follow the colon. We want everything after this until the next period. It is presented. *? \\. BUT we want to keep this part, so I put it in parentheses to make it a βcapture groupβ. Since it is in parens, everything between the colon and the period will be stored in a variable named \ 1 (but you must enter \\ 1 to get the string \ 1). I also added a β,β (comma) to the end of the capture group to separate it from what comes next. SO It will take active ingredients: avobenzone, octocrylene, octyl salicylate. and replace it with avobenzone, octocrylene, octyl salicylate, Since I used gsub (global expansion), it will start working and try to do the same with the rest of the line, replacing other stuff inactive ingredients: water, glycerin, edta. on water, glycerin, edta, Sorry for the ugly trailing ",".
source share