My software allows users to use regexp to prepare files. I am in the process of adding a default regexp library with generic expressions that can be reused to prepare various formats. One of the common tasks is to remove crlf in certain parts of the files, but not in others. For example, this:
<TU>Lorem
Ipsum</TU>
<SOURCE>This is a sentence
that should not contain
any line break.
</SOURCE>
It should become:
<TU>Lorem
Ipsum</TU>
<SOURCE>This is a sentence that should not contain any line break.
</SOURCE>
I have rexep that does the job pretty nicely:
(?(?<=<SOURCE>(?:(?!</?SOURCE>).)*)(\r\n))
The problem is that it is intensively processed and with files above 500 KB, it can take 30 or more seconds. (regex compiled, in this case uncompiled is much slower)
This is not a big problem, but I wonder if there is a better way to achieve the same results with Regex.
Thanks in advance for your suggestions.