In Perl, how can I correctly parse tab / space delimited files with quoted strings?

I need to parse table and space delimited files that contain many columns in Perl. The values โ€‹โ€‹are such that there are large strings enclosed in double quotes. These lines can have any characters, such as tabs and spaces or something else.

When I try to parse them using the split function, it also splits these lines. Now, how can I make perl understand that the rows in "" are a separate column entry?

A simple example is

12 345546.67677 "Hello World!!!" -567.55656 0.5465767 "Hello_Again; " 
+4
source share
3 answers

Use the Text::CSV library that handles all edge cases for you. It allows you to set the separator:

 my $csv = Text::CSV->new({sep_char => "\t"}); 
+20
source

Please note that you say the tab / space is limited. If the delimiters are mixed and / or you need to treat consecutive spaces as one, using Text :: ParseWords might be easier:

 #!/usr/bin/perl use Text::ParseWords qw( quotewords ); use YAML; while ( my $line = <DATA> ) { print Dump [ quotewords('\s+', 0, $line) ]; } __DATA__ 12 345546.67677 "Hello World!!!" -567.55656 0.5465767 "Hello_Again; " 

Output:

  ---
 - 12
 - 345546.67677
 - Hello World !!!
 - -567.55656
 - 0.5465767
 - 'Hello_Again;  '' 
+7
source
+3
source

Source: https://habr.com/ru/post/1332907/


All Articles