The Perl line runs 30 times faster with single quotes than with double quotes

We have a task to change some lines in binary files to lower case (from mixed / upper / whatever). The corresponding lines are links to other files (this is due to an update in which we also move from Windows to Linux as a server environment, so the case unexpectedly matters). We wrote a script that uses the perl loop to do this. We have a directory containing about 300 files (the total size of the directory is about 150 million), so some data, but not huge amounts.

The following perl code takes about 6 minutes to complete the task:

for file_ref in `ls -1F $forms6_convert_dir/ | grep -v "/" | sed 's/\(.*\)\..*/\1/'` do (( updated++ )) write_line "Converting case of string: $file_ref " perl -i -pe "s{(?i)$file_ref}{$file_ref}g" $forms6_convert_dir/* done 

while the following perl code takes more than 3 hours!

 for file_ref in `ls -1F $forms6_convert_dir/ | grep -v "/" | sed 's/\(.*\)\..*/\1/'` do (( updated++ )) write_line "Converting case of string: $file_ref " perl -i -pe 's{(?i)$file_ref}{$file_ref}g' $forms6_convert_dir/* done 

Can someone explain why? Is it just that $ file_ref remains as a string of $ file_ref instead of being replaced with a value in a single version of quotation marks? in this case, what does it replace in this version? We want to replace all incidents with any file name, but in lower case. If we run the lines in the files before and after and look for the file names, then both of them seem to have made the same changes. However, if we run diff for files created in two loops (diff firstloop / file1 secondloop / file1), then it reports that they are different.

This is executed from a bash script in linux.

+6
source share
3 answers

As stated in other answers, the shell does not replace the variables inside single quotes, so the second version executes the literal operator Perl s{(?i)$file_ref}{$file_ref}g for each line in each file.

As you said in a comment, if $ is the end-of-line metacharacter, $file_ref never match anything. $ matches before a new line at the end of a line, so the next character must be a new line. Therefore, Perl does not interpret $ as a metacharacter; he interprets it as the beginning of an interpolation variable.

In Perl, the variable $file_ref is undef , which when interpolated is treated as an empty string. So you really do s{(?i)}{}g , which says, to replace the empty string with the empty string and do this for all occurrences case sensitive. Well, there is an empty line between each pair of characters, plus one at the beginning and at the end of each line. Perl finds each and replaces it with an empty string. It is not-op, but it is expensive, therefore, 3 hours work time.

You must be mistaken in the fact that both versions make the same changes. As I just explained, the single-quoted version is just an expensive option, no; it does not make any changes to the contents of the file (it simply creates a new copy of each file). The files on which you ran it should have been converted to lowercase.

+4
source

The shell does not perform variable substitution for single quotes. So the second one is a different program.

+17
source

With double quotes, you use a shell variable with single quotes. Perl is trying to use a variable of this name.

You might want to write the whole batch in Perl or Bash to speed things up. Both languages ​​can read files and map patterns. In Perl, you can lowercase using the built-in lc function, and in Bash 4 you can use ${file,,} .

+1
source

Source: https://habr.com/ru/post/894383/


All Articles