Removing a newline character from a string in Perl

I have a line that is read from a text file, but on Ubuntu Linux, and I'm trying to remove its newline from the end.

I used all the ways. But for s/\n|\r/-/ (I look to see if it finds any new lines for the line), it replaces the line, but it still goes to the next line when it is printed. Moreover, when I used chomp or chop , the line is completely deleted. I did not find another solution. How can I fix this problem?

 use strict; use warnings; use v5.12; use utf8; use encoding "utf-8"; open(MYINPUTFILE, "<:encoding(UTF-8)", "file.txt"); my @strings; my @fileNames; my @erroredFileNames; my $delimiter; my $extensions; my $id; my $surname; my $name; while (<MYINPUTFILE>) { my ($line) = $_; my ($line2) = $_; if ($line !~ /^(((\X|[^\W_ ])+)(.docx)(\n|\r))/g) { #chop($line2); $line2 =~ s/^\n+//; print $line2 . " WRONG FORMAT!\n"; } else { #print "INSERTED:".$13."\n"; my($id) = $13; my($name) = $2; print $name . "\t" . $id . "\n"; unshift(@fileNames, $line2); unshift(@strings, $line2 =~ /[^\W_]+/g); } } close(MYINPUTFILE); 
+6
source share
5 answers

The correct way to remove Unicode graphic faces, including CRLF pairs, is to use the \R regex metacharacter introduced in version 5.10.

The use encoding very outdated. You must either use the use open pragma, or use the encoding in the mode argument to 3-arg open , or use binmode .

  use v5.10; # minimal Perl version for \R support use utf8; # source is in UTF-8 use warnings qw(FATAL utf8); # encoding errors raise exceptions use open qw(:utf8 :std); # default open mode, `backticks`, and std{in,out,err} are in UTF-8 while (<>) { s/\R\z//; ... } 
+16
source

You may have encountered a line ending from the Windows file causing the problem. For example, a string such as "foo bar \ n" will actually be "foo bar \ r \ n". When using chomp in Ubuntu, you delete everything that is contained in the $/ variable , which will be "\ n". So that leaves "foo bar \ r".

This is a subtle but very common mistake. For example, if you type "foo bar \ r" and add a new line, you will not notice an error:

 my $var = "foo bar\r\n"; chomp $var; print "$var\n"; # Remove and put back newline 

But when you concatenate a line with another line, you overwrite the first line because \r moves the output descriptor to the beginning of the line. For instance:

 print "$var: WRONG\n"; 

This will effectively be "foo bar \ r: WRONG \ n", but the text after \r will cause the following text to wrap over the first part:

 foo bar\r # \r resets position : WRONG\n # Second line prints and overwrites 

This is more obvious when the first line is longer than the second. For example, try the following:

 perl -we 'print "foo bar\rbaz\n"' 

And you will get the result:

 baz bar 

The solution is to delete the failed rows. You can do this with the dos2unix or directly in Perl with:

 $line =~ s/[\r\n]+$//; 

Also, keep in mind that your other code is somewhat horrible. What do you think, for example, that $13 contains? This will be the string captured by the 13th bracket in the previous regular expression. I am sure that the value will always be undefined because you do not have 13 parentheses.

You declare two sets of $id and $name . One outside the loop and one upstairs. This is a very bad practice, IMO. Just declare variables within the scope they need, and never bind all of your declarations at the top of your script unless you explicitly want them to be global in the file.

Why use $line and $line2 when they have the same value? Just use $line .

And seriously, what is connected with this:

 if ($line !~ /^(((\X|[^\W_ ])+)(.docx)(\n|\r))/g) { 

This is like an attempt to obfuscate, no offense. Three nested negatives and a bunch of extra parentheses?

First of all, since this is an if-else, just swap it and cancel the regex. Secondly, [^\W_] double negation is rather confusing. Why not just use [A-Za-z0-9] ? You can split this to simplify parsing:

 if ($line =~ /^(.+)(\.docx)\s*$/) { my $pre = $1; my $ext = $2; 
+7
source

You can erase lines with these words:

 $line =~ s/[\n\r]//g; 

If you do this, you will need to modify the regular expression in the if so as not to look for them. I also don't think you want /g in your if . You really shouldn't have $line2 .

I wouldn’t do that either:

 print $line2." WRONG FORMAT!\n"; 

You can do

 print "$line2 WRONG FORMAT!\n"; 

... instead of this. In addition, print accepts a list, so instead of concatenating strings, you can just use commas.

+5
source

You can do something like:

=~ tr/\n//

But really chomp should work:

 while (<filehandle>){ chomp; ... } 

Also s/\n|\r// replaces the first occurrence of \r or \n . If you want to replace all occurrences, you will need a global modifier at the end of s/\r|\n//g .

Note: if you enable \r for windows, it usually ends its line as \r\n , so you will want to replace both (for example, s/(?:\r\n|\n)// ), of course, the expression above ( s/\r|\n//g ) with a global modifier will take care of this anyway.

+3
source
 $variable = join('',split(/\n/,$variable)) 
+1
source

Source: https://habr.com/ru/post/910990/


All Articles