You may have encountered a line ending from the Windows file causing the problem. For example, a string such as "foo bar \ n" will actually be "foo bar \ r \ n". When using chomp
in Ubuntu, you delete everything that is contained in the $/
variable , which will be "\ n". So that leaves "foo bar \ r".
This is a subtle but very common mistake. For example, if you type "foo bar \ r" and add a new line, you will not notice an error:
my $var = "foo bar\r\n"; chomp $var; print "$var\n";
But when you concatenate a line with another line, you overwrite the first line because \r
moves the output descriptor to the beginning of the line. For instance:
print "$var: WRONG\n";
This will effectively be "foo bar \ r: WRONG \ n", but the text after \r
will cause the following text to wrap over the first part:
foo bar\r # \r resets position : WRONG\n # Second line prints and overwrites
This is more obvious when the first line is longer than the second. For example, try the following:
perl -we 'print "foo bar\rbaz\n"'
And you will get the result:
baz bar
The solution is to delete the failed rows. You can do this with the dos2unix
or directly in Perl with:
$line =~ s/[\r\n]+$//;
Also, keep in mind that your other code is somewhat horrible. What do you think, for example, that $13
contains? This will be the string captured by the 13th bracket in the previous regular expression. I am sure that the value will always be undefined because you do not have 13 parentheses.
You declare two sets of $id
and $name
. One outside the loop and one upstairs. This is a very bad practice, IMO. Just declare variables within the scope they need, and never bind all of your declarations at the top of your script unless you explicitly want them to be global in the file.
Why use $line
and $line2
when they have the same value? Just use $line
.
And seriously, what is connected with this:
if ($line !~ /^(((\X|[^\W_ ])+)(.docx)(\n|\r))/g) {
This is like an attempt to obfuscate, no offense. Three nested negatives and a bunch of extra parentheses?
First of all, since this is an if-else, just swap it and cancel the regex. Secondly, [^\W_]
double negation is rather confusing. Why not just use [A-Za-z0-9]
? You can split this to simplify parsing:
if ($line =~ /^(.+)(\.docx)\s*$/) { my $pre = $1; my $ext = $2;