Skip x last lines when reading text file

I am reading text data from a large file line by line.
But I only need to read the lines nx (not read the last lines x).

How can I do this without reading the entire file more than once?
(I read the line and process it right away, so I can't go back)

+4
source share
3 answers

In this post I will give you two completely different approaches to solving your problem, and depending on your use case, one solution will be better than the other.

Alternative # 1

This method is memory efficient, although quite complicated if you are going to skip a large amount of content, this method is recommended, since you will only store one line at a time in memory during processing.

The implementation of this in this post may not be super optimized, but the theory behind it becomes clear.

You start by reading the file back, look for N line break numbers. When you successfully find where in the file that you would like to stop processing later, you will return to the beginning of the file.

Alternative # 2

This method is easy to understand and very simple. At runtime, you will have N the number of lines stored in memory, where N is the number of lines you would like to skip at the end.

The lines will be saved in the FIFO container (First In, First Out). You add the last line of reading to your FIFO, and then delete and process the first record. Thus, you will always process lines of at least N records at the end of your file.



Alternative # 1

This may seem strange, but it is definitely doable, and I would recommend that you do it; start by reading the file back .

  • Search at end of file
  • Read (and discard) bytes (at the beginning of the file) until you find SKIP_N line breaks
  • Save this position
  • Look for the beginning of the file
  • Read (and process) the lines until you reach the position you saved.

Code example:

In the code below, the last lines 42 of /tmp/sample_file will be deleted and the rest printed using the method described earlier in this post.

 import java.io.RandomAccessFile; import java.io.File; import java.lang.Math; public class Example { protected static final int SKIP_N = 42; public static void main (String[] args) throws Exception { File fileHandle = new File ("/tmp/sample_file"); RandomAccessFile rafHandle = new RandomAccessFile (fileHandle, "r"); String s1 = new String (); long currentOffset = 0; long endOffset = findEndOffset (SKIP_N, rafHandle); rafHandle.seek (0); while ((s1 = rafHandle.readLine ()) != null) { ; currentOffset += s1.length () + 1; // (s1 + "\n").length if (currentOffset >= endOffset) break; System.out.println (s1); } } protected static long findEndOffset (int skipNLines, RandomAccessFile rafHandle) throws Exception { long currentOffset = rafHandle.length (); long endOffset = 0; int foundLines = 0; byte [] buffer = new byte[ 1024 > rafHandle.length () ? (int) rafHandle.length () : 1024 ]; while (foundLines < skipNLines && currentOffset != 0) { currentOffset = Math.max (currentOffset - buffer.length, 0); rafHandle.seek (currentOffset); rafHandle.readFully (buffer); for (int i = buffer.length - 1; i > -1; --i) { if (buffer[i] == '\n') { ++foundLines; if (foundLines == skipNLines) endOffset = currentOffset + i - 1; // we want the end to be BEFORE the newline } } } return endOffset; } } 


Alternative # 2

  • Reading from file line by line
  • On each successfully read line, insert a line at the end of the LinkedList<String>
  • If your LinkedList<String> contains more lines than you would like to skip, delete the first entry and process it.
  • Repeat until more lines are read.

Code example

 import java.io.InputStreamReader; import java.io.FileInputStream; import java.io.DataInputStream; import java.io.BufferedReader; import java.util.LinkedList; public class Example { protected static final int SKIP_N = 42; public static void main (String[] args) throws Exception { String line; LinkedList<String> lli = new LinkedList<String> (); FileInputStream fis = new FileInputStream ("/tmp/sample_file"); DataInputStream dis = new DataInputStream (fis); InputStreamReader isr = new InputStreamReader (dis); BufferedReader bre = new BufferedReader (isr); while ((line = bre.readLine ()) != null) { lli.addLast (line); if (lli.size () > SKIP_N) { System.out.println (lli.removeFirst ()); } } dis.close (); } } 
+6
source

You need to use simple reading logic.

First read the lines of x and put them in the buffer. Then you can read one line at a time, add it to the end of the buffer, and process the first line in the buffer. When you reach EOF , there are x unprocessed lines in the buffer.

Update: I noticed comments on this and my own answer, so just clarify: my suggestion works when n unknown. x should be known, of course. All you have to do is create a simple buffer, then fill the buffer with x lines and then start processing.

As for the buffer implementation, since we're talking about Java builds, you need a simple LinkedList . Since you will be pulling one row from the buffer for each row that you place in it, an ArrayList will not work well with constantly changing array indices. Generally speaking, an array-supported buffer should be round to avoid poor performance.

+6
source

Just read the lines x in front. It has an x string queue.

+2
source

Source: https://habr.com/ru/post/1386168/


All Articles