Any suggestions for improving (optimizing) an existing string replacement in Perl code?

Perl 5.8

Improvements for fairly simple string substitutions in an existing Perl script.
The purpose of the code is clear, and the code works.

For a given string, replace each occurrence of the TAB, LF, or CR character with a single space, and replace each occurrence of the double quote with two double quotation marks. Here is a snippet from existing code:


# replace all tab, newline and return characters with single space $val01 =~s/[\t\n\r]/ /g; $val02 =~s/[\t\n\r]/ /g; $val03 =~s/[\t\n\r]/ /g; # escape all double quote characters by replacing with two double quotes $val01 =~s/"/""/g; $val02 =~s/"/""/g; $val03 =~s/"/""/g; 

Question: Is there a better way to perform these string manipulations?

In a β€œbetter way,” I would like to execute them more efficiently, avoiding the use of regular expressions (possibly using tr/// to replace the tab, newline, and lf characters) or perhaps using ( qr// ) to avoid recompiling.

NOTE. I considered moving string manipulation operations into a subroutine to reduce regexp.

NOTE. This code works, it really is not broken. I just want to know if a more suitable coding convention exists.

NOTE. These operations are performed in a loop, a large number (> 10000) iterations.

NOTE. This script is currently running under perl v5.8.8. (The script has require 5.6.0 , but it can be changed to require 5.8.8 . (Installing a later version of Perl is currently not an option on the production server.)

 > perl -v This is perl, v5.8.8 built for sun4-solaris-thread-multi (with 33 registered patches, see perl -V for more detail) 
+4
source share
4 answers

Your existing solution looks good to me.

To avoid recompilation, you do not need to worry about it. Perl regular expressions are compiled only once, if any, if they do not contain interpolated expressions that you do not have.

For completeness, I should mention that even if interpolated expressions are present, you can tell Perl to compile the regular expression once, just by setting the /o flag.

 $var =~ s/foo/bar/; # compiles once $var =~ s/$foo/bar/; # compiles each time $var =~ s/$foo/bar/o; # compiles once, using the value $foo has # the first time the expression is evaluated 
+3
source

TMTOWTDI

You can use tr or index or substr or split functions as an alternative. But you must take measurements to determine the best method for your particular system.

+2
source

You may be optimizing prematurely. Have you tried using a profiler like Devel :: NYTProf to find out where your program spends most of its time?

+2
source

I assume that tr/// will be (slightly) faster than s/// in your first regex. Of course, it will be much faster determined by factors that I do not know about your program and your environment. Profiling and benchmarking will answer this question.

But if you are interested in any improvement to your code, can I suggest a fix for maintainability? You are doing the same substitution (or set of substitutions) across three variables. This means that when you change this substitution, you need to change it three times - and doing the same thing three times is always dangerous :)

You might think that code refactoring would look something like this:

 foreach ($val01, $val02, $val03) { s/[\t\n\r]/ /g; s/"/""/g; } 

In addition, it would be nice to have these values ​​in an array, rather than three such equally named variables.

 foreach (@vals) { s/[\t\n\r]/ /g; s/"/""/g; } 
+2
source

Source: https://habr.com/ru/post/1348188/


All Articles