How to mix lines in a file without first reading the entire file?

What is a good algorithm for shuffling lines in a file without first reading the entire file?

I assume that it will look something like this: first start reading line by line, storing line by line at each point and deciding whether you want to print one of the lines that are still stored (and then delete from the repository) or do nothing and go to the next line.

Can someone check / confirm this and / or possibly publish working code (perl, python, etc.)?

Related questions, but not looking at memory-efficient algorithms:

+3
source share
3 answers

I can’t think of a way to randomly execute the entire file without any kind of saving a list of what has already been written. I think that if I had to do an efficient memory shuffling, I would scan the file by building a list of offsets for new lines. As soon as I have a list of new line offsets, I would randomly select one of them, write it to stdout and remove it from the list of offsets.

I am not familiar with perl or python, but can demonstrate with php.

<?php
$offsets = array();

$f = fopen("file.txt", "r");
$offsets[] = ftell($f);
while (! feof($f))
{
  if (fgetc($f) == "\n") $offsets[] = ftell($f);
}

shuffle($offsets);
foreach ($offsets as $offset)
{
  fseek($f, $offset);
  echo fgets($f);
}
fclose($f);
?>

, , , ( ):

  • , stdout
  • Loop bytes_written == filesize
  • ,
  • ,
  • ,
  • 3.
+4

linear .

:

  • n ( ), ( - ), , . , 2 , , s e n, , s[i] e[i] , i th. ++ vector.

  • 1 n ( ++ random_shuffle) , , p (, 1 2 3 4 p = [3 1 4 2]). , i p[i] (.. ).

  • , s[p[0]] e[p[0]] .

  • , 2, .

, ( random_shuffle ), , / ( ) .

+3

N N . , . . , . , , N (, , N ).

Edit

python:

import sys
import random

CACHE_SIZE = 23

lines = {}

for l in sys.stdin: # you can replace sys.stdin with xrange(200) to get a test output
    i = random.randint(0, CACHE_SIZE-1)
    old = lines.get(i)
    if old:
        print old,
    lines[i] = l

for ignored, p in lines.iteritems():
    print p,
0

Source: https://habr.com/ru/post/1757246/


All Articles