Regex: how to remove extra spaces between lines in Perl

I am working on a program that accepts user input for two file names. Unfortunately, the program can easily break if the user does not follow the specified input format. I want to write code that improves its fault tolerance against these types of errors. You will understand when you see my code:

# Ask the user for the filename of the qseq file and barcode.txt file print "Please enter the name of the qseq file and the barcode file separated by a comma:"; # user should enter filenames like this: sample1.qseq, barcode.txt # remove the newline from the qseq filename chomp ($filenames = <STDIN>); # an empty array my @filenames; # remove the ',' and put the files into an array separated by spaces; indexes the files push @filename, join(' ', split(',', $filenames)) # the qseq file my $qseq_filename = shift @filenames; # the barcode file. my barcode = shift @filenames; 

Obviously, this code execution can lead to errors if the user enters the wrong type of file name (.tab file instead of .txt or .seq instead of .qseq). I need code that can do some validation to see if the user is entering the appropriate file type.

Another mistake that can break code is that the user enters too many spaces before the file names. For example: sample1.qseq, (imagine 6 spaces here) barcode.txt (note the numerous spaces after the decimal point)

Another example: (imagine 6 spaces here) sample1.qseq, barcode.txt (this time, pay attention to the number of spaces before the first file name)

I also need lines of code that can remove extra spaces so that the program does not interrupt. I think user input should have the following format: sample1.qseq, barcode.txt. User input should be in a format so that I can correctly index the file names in the array and subsequently transfer them.

Thanks for any help or suggestions greatly appreciated!

+6
source share
5 answers

The standard way to solve this problem is to use command-line options rather than collecting data from STDIN. Getopt :: Long ships with Perl and is serviced by:

 use strict; use warnings FATAL => 'all'; use Getopt::Long qw(GetOptions); my %opt; GetOptions(\%opt, 'qseq=s', 'barcode=s') or die; die <<"USAGE" unless exists $opt{qseq} and $opt{qseq} =~ /^sample\d[.]qseq$/ and exists $opt{barcode} and $opt{barcode} =~ /^barcode.*\.txt$/; Usage: $0 --qseq sample1.qseq --barcode barcode.txt $0 -q sample1.qseq -b barcode.txt USAGE printf "q==<%s> b==<%s>\n", $opt{qseq}, $opt{barcode}; 

The shell will deal with any extraneous spaces, try and see. You need to do a file name check, I made something with a regex in the example. Use Pod :: Usage for a more convenient way to output useful documentation to your users who are likely to call the call incorrectly.

There are dozens of more advanced Getopt modules in CPAN.

+8
source

First put use strict; at the top of your code and declare your variables.

Secondly, these are:

 # remove the ',' and put the files into an array separated by spaces; indexes the files push @filename, join(' ', split(',', $filenames)) 

Not going to do what you want. split () takes a string and turns it into an array. The union takes a list of elements and returns a string. You just want to split:

 my @filenames = split(',', $filenames); 

This will create the array you expect.

This function will safely trim the space from the beginning and end of the line:

 sub trim { my $string = shift; $string =~ s/^\s+//; $string =~ s/\s+$//; return $string; } 

Access it as follows:

 my $file = trim(shift @filenames); 

Depending on your script, it might be easier to pass strings as command line arguments. You can access them through the @ARGV array, but I prefer to use GetOpt :: Long:

 use strict; use Getopt::Long; Getopt::Long::Configure("bundling"); my ($qseq_filename, $barcode); GetOptions ( 'q|qseq=s' => \$qseq_filename, 'b|bar=s' => \$barcode, ); 

Then you can call it like:

 ./script.pl -q sample1.qseq -b barcode.txt 

And the variables will be populated correctly without having to worry about trimming the space.

+4
source

You need to trim the spaces before processing the file name data in your routine, you can check the file extension with another regular expression, as is well described in Is there a regular expression in Perl to search for a file extension? . If this is the actual file type that matters to you, then it might be more appropriate to check this out instead of File :: LibMagicType .

+2
source

While I think your design is a little fine, will the following work?

 my @fileNames = split(',', $filenames); foreach my $fileName (@fileNames) { if($fileName =~ /\s/) { print STDERR "Invalid filename."; exit -1; } } my ($qsec, $barcode) = @fileNames; 
+1
source

And here is another way to do this using regex (if you are reading input from STDIN ):

 # read a line from STDIN my $filenames = <STDIN>; # parse the line with a regex or die with an error message my ($qseq_filename, $barcode) = $filenames =~ /^\s*(\S.*?)\s*,\s*(\S.*?)\s*$/ or die "invalid input '$filenames'"; 
+1
source

Source: https://habr.com/ru/post/917667/


All Articles