Can regex be used for trimming?

Let's say I have three lines:

the quick brown fox the brown fox the quick brown quick fox 

Is it possible to use a regular expression to trim everything on every line except the word quick ?

The end result will look like this:

 quick quickquick 
+4
source share
4 answers

The specifics depend on the language you are using, but here are a few general approaches for doing this using regular expression (Python code examples):

  • Find all matches of your target string, and then combine each match in one string:

     >>> import re >>> s = 'the quick brown quick fox' >>> ''.join(re.findall('quick', s)) 'quickquick' 
  • Create a regex to match everything except your target string, and then replace each match with an empty string (this is usually much more complicated than other alternatives):

     >>> re.sub('(?!quick|(?<=q)uick|(?<=qu)ick|(?<=qui)ck|(?<=quic)k).', '', s) 'quickquick' 
  • Use capture groups to match everything until the target line appears, and then replace it only with the target line:

     >>> re.sub('.*?(quick|$)', r'\1', s) 'quickquick' 

If your line has several lines, as in your example, you can first break lines into line breaks or adapt solutions to maintain line breaks, for example:

 >>> s = '''the quick brown fox ... the brown fox ... the quick brown quick fox''' >>> print ''.join(re.findall('quick|[\r\n]', s)) quick quickquick >>> print re.sub('.*?(quick|$)', r'\1', s, flags=re.MULTILINE) quick quickquick 
+4
source

Only one expression is not used for string management. This is just a pattern matching. Depending on the tool you use to implement it, you can usually perform string replacements with regular expressions. For example, from the Bash terminal you can use Sed, or with PHP you can use preg_replace ().

+1
source

Yes, regular expressions are capable of this search by implementing lookahead and lookbehind constructs .

For example, here we use a Python program using a positive lookahead expression that does what you need:

 import re s = '''the quick brown fox the brown fox the quick brown quick fox''' rx = re.compile('(?!quick).*') print rx.findall(s) 

Output:

 ['the quick brown fox', '', 'the brown fox', '', 'the quick brown quick fox', ''] 
0
source

Use sed for this:

 sed -r 's/(quick|)./\1/g' file.txt 
0
source

Source: https://habr.com/ru/post/1441603/


All Articles