Matching and Removing Perl Regular Expressions

Question

Matching and Removing Perl Regular Expressions

I have a line starting with //#... that reaches the characater newline character. I calculated the regular expression for which it is ..#([^\n]*) .

My question is: how to remove this line from a file if the following condition matches

+4

regex perl

Azlam Sep 17 '08 at 6:00

source share

9 answers

To filter out all lines in a file that match a specific regular expression:

 perl -n -i.orig -e 'print unless /^#/' file1 file2 file3

".orig" after the -i switch backs up the file with the specified extension (.orig). You can skip it if you don't need a backup (just use -i).

The -n switch causes perl to execute your instructions (-e '...') for each line in the file. The string is stored in $ _ (which is also the default argument for many instructions, in this case: matching print and regular expression).

Finally, the -e switch argument says: "Print the line if it does not match the # character at the beginning of the line.

PS. There is also a -p switch, which behaves like -n, except that lines are always printed (useful for searching and replacing)

+2

kixx Sep 17 '08 at 7:40

source share

As others have pointed out, if the ultimate goal is to remove lines starting with //# , for performance reasons you are probably better off using grep or sed :

 grep -v '^\/\/#' filename.txt > filename.stripped.txt sed '/^\/\/#/d' filename.txt > filename.stripped.txt

or

 sed -i '/^\/\/#/d' filename.txt

if you prefer in-place editing.

Note that in perl your regex will be

 m{^//#}

which matches two slashes followed by # at the beginning of the line.

Note that you avoid "backslashitis" by using the m{pattern} matching operator instead of the more familiar /pattern/ . Immerse yourself in this syntax earlier, as this is an easy way to avoid excessive acceleration. You can write m{^//#} as effectively as m%^//#% or m#^//\## , depending on what you want to match. Strive for clarity - regular expressions are complex enough to decipher avoided backslashes without a prickly forest. Seriously, m/^\/\/#/ looks like a toothed alligator with a filling or tiny ASCII picture in the Alps.

One of the problems that can occur in your script is that the whole file is laid out in a line, new lines and that’s it. To protect against this case, use the / m (multi-line) modifier in the regular expression:

 m{^//#}m

This allows ^ to match at the beginning of a line and after a new line. You might think that there is a way to split or match strings matching m{^//#.*$} Using the regex /g , /m and /s modifiers in case you cut the file into a string, I want make a copy of it (beg the question of why it was first put in a string). It should be possible, but it's late, and I do not see the answer. However, one “simple” way to do this is:

 my $cooked = join qq{\n}, (grep { ! m{^//} } (split m{\n}, $raw));

although this creates a copy instead of in-place editing in the $raw source line.

+2

arclight Sep 17 '08 at 9:15

source share

You do not need perl for this.

 sed '/^\/\/#/d' inputfile > outputfile

I <3 sed.

+1

Aeon Sep 17 '08 at 7:26

source share

Read the file line by line and only write these lines to a new file that does not match the regular expression. You cannot just delete a row.

0

EricSchaefer Sep 17 '08 at 6:06

source share

Does it start at the beginning of a line or can it be displayed anywhere? If the old s / old / new is what you want. If the latter, I will have to understand this. I suspect backlinks might be used somehow.

0

baudtack Sep 17 '08 at 6:07

source share

I do not think your regular expression is true.

First you need to start with ^ or else it will match this pattern anywhere on the line.

Secondly, .. must be \/\/ , otherwise it will match any two characters.

^\/\/#[^\n]* , probably you want.

Then do what EricSchaefer says and read line by line, only writing lines that do not match.

- BMB Page

0

bmb Sep 17 '08 at 6:18

source share

Try the following:

 perl -ne 'print unless m{^//#}' input.txt > output.txt

If you use windows, you need double quotes instead of single quotes.

You can do the same with grep

 grep -v -e '^//#' input.txt > output.txt

0

Pat Sep 17 '08 at 7:09

source share

Iterate over each line in the file and skip the line if it matches the pattern:

  my $ fh = new FileHandle 'filename'
     or die "Failed to open file - $!";

 while (my $ line = $ fh-> getline) {
     next if $ line = ~ m {^ // #};
     print $ line;
 }
 close $ fh;

This will print all lines from the file except for the line starting with '// #'.

0

David Precious Sep 17 '08 at 7:11

source share

Aristotle pagaltzis · Accepted Answer · 2008-09-17T07:56:04+0000

Your regular expression is poorly selected at several points:

Instead of matching two slashes on purpose, you use .. to match two characters, which can be something at all, apparently because you don't know how to combine slashes when you also use them as delimiters. (In fact, the points correspond to almost all, and also see No. 3.)
In a literal expression with a regular expression with a slash // you can match slashes simply by protecting them with backslashes, for example. /\/\// . However, a more pleasant option is to use a longer regular expression literal, m// , where you can choose a delimiter, for example. m!! . Since you use something other than slashes for delimitation, you can write them without leaving them: m!//! . See perldoc perlop .
It is not tied to the beginning of the line, so it will match anywhere. Use statement ^ start of line.
You wrote [^\n] to match "any character except a new line" when there is a much easier way to write this, which is just a character . . It does exactly what matches any character except a newline.
You use parentheses to group part of the correspondence, but the group is not quantified (you do not indicate that it can match any other number of times than exactly once), and you are not interested in keeping it. Therefore, brackets are redundant.

Altogether this makes it m!^//#.*! . But putting incomplete .* (Or something with a quantifier * ) at the end of the regular expression is pointless, as it will never change whether the string matches or not: * happy that nothing will match.

So you get m!^//#! .

How to delete a line from a file, as everyone else explained, read it line by line and print all the lines that you want to save in another file. If you do not do this in a larger program, use perls for the command line:

 perl -ni.bak -e'print unless m!^//#!' somefile.txt

Here the -n switch perl places a loop around the code you provided, which will read all the files that you pass on the command line in sequence. The -i switch (for "in place") says to collect the result from your script and overwrite the original contents of each file. The .bak parameter for the -i parameter tells perl to back up the source file to a file named after the name of the source file with the addition of .bak . For all of these bits, see perldoc perlrun .

If you want to do this in the context of a larger program, the easiest way to do it safely is to open the file twice, once for reading and separately, using IO :: AtomicFile , at another time for writing. IO :: AtomicFile will replace the original file only if it is successfully closed.

Matching and Removing Perl Regular Expressions

More articles: