How can I randomly try the contents of a file?

I have a content file

abc
def
high
lmn
...
...

Files contain more than 2 million lines. I want to randomly try lines from files and output 50K lines. Any thoughts on how to approach this problem? I thought of the lines of Perl and its functions rand(or a convenient shell command would be neat).

Related (possibly recurring) questions:

+3
source share
5 answers

Assuming you basically want to output about 2.5% of all lines, this will do:

print if 0.025 > rand while <$input>;
+12

:

sort -R file | head -n 50000
+5

:

use strict;
use warnings;

# Number of lines to pick and file to pick from
# Error checking omitted!
my ($pick, $file) = @ARGV;

open(my $fh, '<', $file)
    or die "Can't read file '$file' [$!]\n";

# count lines in file
my ($lines, $buffer);
while (sysread $fh, $buffer, 4096) {
    $lines += ($buffer =~ tr/\n//);
}

# limit number of lines to pick to number of lines in file
$pick = $lines if $pick > $lines;

# build list of N lines to pick, use a hash to prevent picking the
# same line multiple times
my %picked;
for (1 .. $pick) {
    my $n = int(rand($lines)) + 1;
    redo if $picked{$n}++
}

# loop over file extracting selected lines
seek($fh, 0, 0);
while (<$fh>) {
    print if $picked{$.};
}
close $fh;
+2

Perl:

CPAN. File::RandomLine, , .

+2

perlfaq5: " ?"


, .

Camel:

srand;
rand($.) < 1 && ($line = $_) while <>;

. " ", 2, 3.4.2, . .

File:: Random, :

use File::Random qw/random_line/;
my $line = random_line($filename);

Another way is to use the Tie :: File module, which treats the entire file as an array. Just access to the element of a random array.

+2
source

Source: https://habr.com/ru/post/1711120/


All Articles