Reading in a CSV file in Perl

Question

Reading in a CSV file in Perl

I used to read files in Perl, but not when the CSV file has the values I need on different lines. I assume that I need to create an array mixed with hash keys, but I'm not in my league.

Basically, my CSV file has the following columns: branch, job, timePeriod, periodType, day1Value, day2Value, day3Value, day4Value, day4Value, day6Value and day7Value .

Day * values represent a periodType value for each day of the week, respectively.

Example -

  East, Banker, 9 AM-12PM, Overtime, 4.25,0,0,1.25,1.5,1.5,0,0
 West, Electrician, 12 PM-5PM, Regular, 4.25,0,0, -1.25, -1.5, -1.5,0,0
 North, Janitor, 5 PM-12AM, Variance, -4.25,0,0, -1.25, -1.5, -1.5,0,0
 South, Manager, 12A-9AM, Overtime, 77.75,14.75,10,10,10,10,10,10

Etc.

I need to output a file that takes this data and the keys from the branch, job, time and time. My output will list each periodType value for one particular day, and not one periodType value for all seven.

For instance -

  South, Manager, 12A-9AM, 77.75,14.75,16

In the above line, the last 3 values represent three periods of Type (Overtime, Regular, and Variance) day1Values .

As you can see, my problem is that I do not know how to load data into memory so that I can retrieve data from different lines and successfully output them. I just figured out the only lines before.

+4

perl csv

user1107055 Dec 20 '11 at 3:37

source share

2 answers

Jonathan leffler · Answer 1 · 2011-12-20T03:45:18+0000

If you don't like the pain, use Text::CSV and its siblings Text::CSV_XS and Text::CSV_PP .

However, this may be an easier part of this problem. After you read and confirm that the line is complete, you need to add the appropriate information to the correctly set hashes. You may also have to familiarize yourself with the recommendations well.

You can create the %BranchData hash introduced by the branch. Each element of this hash will be a reference to the hash specified by the job; and each element in this will be a reference to the hash entered by the time period, and each element in this case will refer to the array associated with the day number (using indexes 1..7; it allocates space a little, but the chances of getting it right significantly more, do not confuse with $[ , though!). And each element of the array will be a reference to a hash defined by three types of periods. Oh!

If everything works well, a prototypical assignment might look something like this:

 $BranchData{$row{branch}}->{$row{job}}->{$row{period}}->[1]->{$row{p_type}} += $row{day1};

You will iterate over the elements 1..7 and 'day1' .. 'day7'; there is a little cleaning up from the design work to do there.

You need to worry about proper initialization (or maybe not), Perl will do it for you). I assume that the string is returned as a direct hash (not a hash link), with the keys for the branch, job, period, period type ( p_type ) and every day ("day1", "day7" ").

If you know which day you need in advance, you can avoid the accumulation of all days, but this can make more generalized reporting easier to read and accumulate all the data all the time, and then just deal with printing with any subset of all the data needs to be processed.

It was an intriguing enough problem that I cracked this code. I doubt this is optimal, but it works.

 #!/usr/bin/env perl # # SO 8570488 use strict; use warnings; use Text::CSV; use Data::Dumper; use constant debug => 0; my $file = "input.csv"; my $csv = Text::CSV->new({ binary => 1, eol => $/ }) or die "Cannot use CSV: ".Text::CSV->error_diag(); my @headings = qw( branch job period p_type day1 day2 day3 day4 day5 day6 day7 ); my @days = qw( day0 day1 day2 day3 day4 day5 day6 day7 ); my %BranchData; open my $in, '<', $file or die "Unable to open $file for reading ($!)"; $csv->column_names(@headings); while (my $row = $csv->getline_hr($in)) { print Dumper($row) if debug; my %r = %$row; # Not for efficiency; for notational compactness $BranchData{$r{branch}} = { } if !defined $BranchData{$r{branch}}; my $branch = $BranchData{$r{branch}}; $branch->{$r{job}} = { } if !defined $branch->{$r{job}}; my $job = $branch->{$r{job}}; $job->{$r{period}} = [ ] if !defined $job->{$r{period}}; my $period = $job->{$r{period}}; for my $day (1..7) { # Assume that Overtime, Regular and Variance are the only types # Otherwise, you need yet another level of checking whether elements exist... $period->[$day] = { Overtime => 0, Regular => 0, Variance => 0} if !defined $period->[$day]; $period->[$day]->{$r{p_type}} += $r{$days[$day]}; } } print Dumper(\%BranchData);

Given your sample data, the conclusion is:

 $VAR1 = { 'West' => { 'Electrician' => { '12PM-5PM' => [ undef, { 'Regular' => '4.25', 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => '-1.25', 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => '-1.5', 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => '-1.5', 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => 0 } ] } }, 'South' => { 'Manager' => { '12A-9AM' => [ undef, { 'Regular' => 0, 'Overtime' => '77.75', 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => '14.75', 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 10, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 10, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 10, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 10, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 10, 'Variance' => 0 } ] } }, 'North' => { 'Janitor' => { '5PM-12AM' => [ undef, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => '-4.25' }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => '-1.25' }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => '-1.5' }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => '-1.5' }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => 0 } ] } }, 'East' => { 'Banker' => { '9AM-12PM' => [ undef, { 'Regular' => 0, 'Overtime' => '4.25', 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => '1.25', 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => '1.5', 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => '1.5', 'Variance' => 0 }, { 'Regular' => 0, 'Overtime' => 0, 'Variance' => 0 } ] } } };

Have fun with it!

user554546 · Answer 2 · 2011-12-20T03:49:50+0000

I have no direct experience with it, but you can use DBD::CSV and then pass the relatively simple SQL query needed to calculate the aggregation you want.

If you insist on doing it in a complicated way, you can scroll and collect your data in the following hash hash links:

 ( "branch1,job1,timeperiod1"=> { "overtime"=>"overtimeday1value1", "regular"=>"regulartimeday1value1", "variance"=>"variancetimeday1value1" }, "branch2,job2,timeperiod2"=> { "overtime"=>"overtimeday1value2", "regular"=>"regulartimeday1value2", "variance"=>"variancetimeday1value2" }, #etc );

and then just scroll through the corresponding keys. However, this approach is based on sequential key formatting (for example, "East,Banker,9AM-12PM" - this is not the same as "East, Banker, 9AM-12PM" ), so you will need to check the consistent formatting (and ensure its execution) when creating the hash above.

Reading in a CSV file in Perl

More articles: