Sort CSV based on specific column?

Question

Sort CSV based on specific column?

I'm sure I did this in the past, and there is something small that I forget, but how can I sort the CSV file in a specific column? I'm interested in answers with and without third-party Perl modules. Mostly methods without, since I do not always have access to install additional modules.

Sample data:

  name, 25, female
 name, 24, male
 name, 27, female
 name, 21, male

Desired end result after sorting by second numeric column:

  name, 21, male
 name, 24, male
 name, 25, female
 name, 27, female

+3

sorting perl

ckarl787 Nov 19 '10 at 1:43

source share

6 answers

Alan Haggai Alavi · Answer 1 · 2010-11-19T02:08:32+0000

Since CSV is a rather complicated format, it is better to use a module that does this work for us.

The following is an example of using the Text :: CSV module:

#!/usr/bin/env perl use strict; use warnings; use constant AGE => 1; use Text::CSV; my $csv = Text::CSV->new(); my @rows; while ( my $row_ref = $csv->getline( \*DATA ) ) { push @rows, $row_ref; } @rows = sort { $a->[AGE] <=> $b->[AGE] } @rows; for my $row_ref (@rows) { $csv->combine(@$row_ref); print $csv->string(), "\n"; } __DATA__ name,25,female name,24,male name,27,female name,21,male

Simon whitaker · Answer 2 · 2010-11-19T08:34:15+0000

In the spirit of the fact that there is always another way to do this, keep in mind that a simple old GNU look might be enough.

 $ sort -t, -k2 -n unsorted.txt name,21,male name,24,male name,25,female name,27,female

If the command line arguments are:

 -t, # use comma as the record separator -k2 # sort on the second key (record) in the line -n # sort using numerical comparison (like using <=> instead of cmp in perl)

If you need a Perl solution, wrap it in qx (); -)

Sinan Ünür · Answer 3 · 2010-11-19T04:44:04+0000

There is also DBD :: CSV :

 #!/usr/bin/perl use strict; use warnings; use DBI; my $dbh = DBI->connect('dbi:CSV:', undef, undef, { RaiseError => 1, f_ext => '.csv', csv_tables => { test => { col_names => [qw' name age sex '] } }, }); my $sth = $dbh->prepare(q{ SELECT name, age, sex FROM test ORDER BY age }); $sth->execute; while ( my @row = $sth->fetchrow_array ) { print join(',' => @row), "\n"; } $sth->finish; $dbh->disconnect;

Output:

  name, 21, male
 name, 24, male
 name, 25, female
 name, 27, female

Dave cross · Answer 4 · 2010-11-19T09:21:34+0000

The original poster did not request third-party modules (that I can not say anything from CPAN). Although this is a limitation that will terribly limit your ability to write good modern Perl code, in this case you can use the (core) Text :: ParseWords module instead of the (non-main) text :: CSV. So, borrowing to a large extent from the example of Alan, we get:

 #!/usr/bin/env perl use strict; use warnings; use Text::ParseWords; my @rows; while (<DATA>) { push @rows, [ parse_line(',', 0, $_) ]; } @rows = sort { $a->[1] <=> $b->[1] } @rows; foreach (@rows) { print join ',', @$_; } __DATA__ name,25,female name,24,male name,27,female name,21,male

Jotn · Answer 5 · 2010-11-19T01:48:46+0000

When you provide your own comparison code, you can sort anything. Just extract the element you want with a regular expression, or perhaps a split in this case, and then compare it. If you have many elements, I will analyze the data in the list of lists, and then the comparison code can access it without parsing. This would eliminate the parsing of the same line many times compared to other lines.

speedarius · Answer 6 · 2010-11-19T05:24:10+0000

I would do something like this:

 #!/usr/bin/perl use warnings; use strict; my @rows = map { chomp; [split /[,\s]+/, $_] } <DATA>; #read each row into an array my @sorted = sort { $a->[1] <=> $b->[1] } @rows; # sort the rows (numerically) by second column for (@sorted) { print join(', ', @$_) . "\n"; # print them out as CSV } __DATA__ name,25,female name,24,male name,27,female name,21,male

Sort CSV based on specific column?

More articles: