After I wrote about this question, the commentator noted that for my test, you can reduce the execution time by 45%. I rephrased his code a bit:
my @keep; while (<>) { my @data = split; unless (@keep) { @keep = (0, 1, 0, 1, 1); for (my $i = 5; $i < @data; $i += 3) { push @keep, 1, 1, 0; } } my $i = 0; print join(' ', grep $keep[$i++], @data), "\n"; }
This runs almost half the time when my original solution got:
$ time ./zz.pl input.data> / dev / null
real 0m21.861s
user 0m21.310s
sys 0m0.280s
Now you can get another 45% of the performance using Inline :: C rather dirty:
#!/usr/bin/env perl use strict; use warnings; use Inline C => <<'END_C' /* This code 'works' only in a limited set of circumstances! Don't expect anything good if you feed it anything other than plain ASCII */ #include <ctype.h> SV * extract_fields(char *line, AV *wanted_fields) { int ch; IV current_field = 0; IV wanted_field = -1; unsigned char *cursor = line; unsigned char *field_begin = line; unsigned char *save_field_begin; STRLEN field_len = 0; IV i_wanted = 0; IV n_wanted = av_len(wanted_fields); AV *ret = newAV(); while (i_wanted <= n_wanted) { SV **p_wanted = av_fetch(wanted_fields, i_wanted, 0); if (!(*p_wanted)) { croak("av_fetch returned NULL pointer"); } wanted_field = SvIV(*p_wanted); while ((ch = *(cursor++))) { if (!isspace(ch)) { continue; } field_len = cursor - field_begin - 1; save_field_begin = field_begin; field_begin = cursor; current_field += 1; if (current_field != wanted_field) { continue; } av_push(ret, newSVpvn(save_field_begin, field_len)); break; } i_wanted += 1; } return newRV_noinc((SV *) ret); } END_C ;
And here is the part of Perl. Note that we split only once to find out the indices of the fields that need to be saved. As soon as we find out, we pass the rows and indexes (based on 1) to procedure C for slice and cubes.
my @keep; while (my $line = <>) { unless (@keep) { @keep = (2, 4, 5); my @data = split ' ', $line; push @keep, grep +(($_ - 5) % 3), 6 .. scalar(@data); } my $fields = extract_fields($line, \@keep); print join(' ', @$fields), "\n"; }
$ time ./ww.pl input.data> / dev / null
real 0m11.539s
user 0m11.083s
sys 0m0.300s
input.data was generated using:
$ perl -E 'say join ("", "A" .. "ZZZZ") for 1 .. 100'> input.data and it has a size of about 225 MB.