What is the best way to read the last lines from a file in PHP?

In my PHP application, I need to read a few lines, starting from the end a lot of files (mostly logs). Sometimes I only need the latter, sometimes I need tens or hundreds. Basically, I want something flexible, like the Unix tail command.

There are questions about how to get the last line from the file (but I need N lines), and different solutions were given. I'm not sure if one is the best and which works best.

+65
performance php logging
Feb 22 '13 at 13:59
source share
6 answers

Method Overview

Searching the Internet, I came across various solutions. I can group them in three approaches:

  • naive who use the file() function of PHP;
  • cheating that run the tail command on the system;
  • powerful ones who happily jump around an open file using fseek() .

In the end, I decided (or wrote) five decisions, naive, fraudulent and three powerful.

  • The most concise naive solution using the built-in array functions.
  • only a solution based on the tail command is possible , which has a slightly bigger problem: it does not start if tail unavailable, since it is non-Unix (Windows) or in limited environments that do not allow the system functions.
  • A solution in which single bytes are read from the end of a file search for (and counting) newline characters is found here .
  • a multibyte buffered solution optimized for large files is found here .
  • A slightly modified version of solution # 4 , in which the buffer length is dynamic, is determined according to the number of lines to extract.

All solutions work . In the sense that they return the expected result from any file and any number of lines that we request (with the exception of solution # 1, which can break PHP's memory limits in case of large files without returning anything). But which is better?

Performance tests

To answer the question, I run tests. How is this done, right?

I prepared a sample of a 100 KB file combining the different files found in my /var/log . Then I wrote a PHP script that uses each of the five solutions to extract 1, 2, ..., 10, 20, ... 100, 200, ..., 1000 lines from the end of the file. Each individual test is repeated ten times (this is something like 5 ร— 28 ร— 10 = 1400 tests), measuring the average elapsed time in microseconds.

I run the script on my local development machine (Xubuntu 12.04, PHP 5.3.10, dual-core processor 2.70 GHz, RAM 2 GB) using the command line PHP translator. Here are the results:

Execution time on sample 100 KB log file

Decision number 1 and number 2 looks worse. Solution # 3 is only useful when we need to read a few lines. Decisions No. 4 and No. 5 seem to be the best. Notice how dynamic buffer size can optimize the algorithm: the execution time is slightly shorter for a few lines due to the reduced buffer.

Try with a large file. What if we need to read a 10 MB log file?

Execution time on sample 10 MB log file

Now solution # 1 is definitely worse: in fact, loading just 10 MB of a file into memory is not a great idea. I run tests also on a 1MB and 100MB file, and this is almost the same situation.

And for tiny log files? What is the graph for a 10KB file:

Execution time on sample 10 KB log file

Solution No. 1 is now the best! Loading 10K into memory is not a big problem for PHP. Also # 4 and # 5 work well. However, this is an edge case: a 10K log means something like 150/200 lines ...

You can download all my test files, sources and results here .

Final thoughts

Solution # 5 is highly recommended for general use: works great with every file size and is especially good at reading multiple lines.

Avoid solution # 1 if you must read files larger than 10 KB.

Solution # 2 and # 3 are not the best for every test I run: # 2 never works less than 2ms and # 3 depends heavily on the number of lines you specify (works just fine with only 1 or 2 lines) .

+236
Feb 22 '13 at 13:59
source share

This is a modified version that may also skip the last lines:

 /** * Modified version of http://www.geekality.net/2011/05/28/php-tail-tackling-large-files/ and of https://gist.github.com/lorenzos/1711e81a9162320fde20 * @author Kinga the Witch (Trans-dating.com), Torleif Berger, Lorenzo Stanco * @link http://stackoverflow.com/a/15025877/995958 * @license http://creativecommons.org/licenses/by/3.0/ */ function tailWithSkip($filepath, $lines = 1, $skip = 0, $adaptive = true) { // Open file $f = @fopen($filepath, "rb"); if (@flock($f, LOCK_SH) === false) return false; if ($f === false) return false; if (!$adaptive) $buffer = 4096; else { // Sets buffer size, according to the number of lines to retrieve. // This gives a performance boost when reading a few lines from the file. $max=max($lines, $skip); $buffer = ($max < 2 ? 64 : ($max < 10 ? 512 : 4096)); } // Jump to last character fseek($f, -1, SEEK_END); // Read it and adjust line number if necessary // (Otherwise the result would be wrong if file does not end with a blank line) if (fread($f, 1) == "\n") { if ($skip > 0) { $skip++; $lines--; } } else { $lines--; } // Start reading $output = ''; $chunk = ''; // While we would like more while (ftell($f) > 0 && $lines >= 0) { // Figure out how far back we should jump $seek = min(ftell($f), $buffer); // Do the jump (backwards, relative to where we are) fseek($f, -$seek, SEEK_CUR); // Read a chunk $chunk = fread($f, $seek); // Calculate chunk parameters $count = substr_count($chunk, "\n"); $strlen = mb_strlen($chunk, '8bit'); // Move the file pointer fseek($f, -$strlen, SEEK_CUR); if ($skip > 0) { // There are some lines to skip if ($skip > $count) { $skip -= $count; $chunk=''; } // Chunk contains less new line symbols than else { $pos = 0; while ($skip > 0) { if ($pos > 0) $offset = $pos - $strlen - 1; // Calculate the offset - NEGATIVE position of last new line symbol else $offset=0; // First search (without offset) $pos = strrpos($chunk, "\n", $offset); // Search for last (including offset) new line symbol if ($pos !== false) $skip--; // Found new line symbol - skip the line else break; // "else break;" - Protection against infinite loop (just in case) } $chunk=substr($chunk, 0, $pos); // Truncated chunk $count=substr_count($chunk, "\n"); // Count new line symbols in truncated chunk } } if (strlen($chunk) > 0) { // Add chunk to the output $output = $chunk . $output; // Decrease our line counter $lines -= $count; } } // While we have too many lines // (Because of buffer size we might have read too many) while ($lines++ < 0) { // Find first newline and remove all text before that $output = substr($output, strpos($output, "\n") + 1); } // Close file and return @flock($f, LOCK_UN); fclose($f); return trim($output); } 
+4
Apr 26 '17 at 12:27
source share

This will also work:

 $file = new SplFileObject("/path/to/file"); $file->seek(PHP_INT_MAX); // cheap trick to seek to EoF $total_lines = $file->key(); // last line number // output the last twenty lines $reader = new LimitIterator($file, $total_lines - 20); foreach ($reader as $line) { echo $line; // includes newlines } 

Or without LimitIterator :

 $file = new SplFileObject($filepath); $file->seek(PHP_INT_MAX); $total_lines = $file->key(); $file->seek($total_lines - 20); while (!$file->eof()) { echo $file->current(); $file->next(); } 

Unfortunately, your segfaults test file is on my machine, so I canโ€™t say how this works.

+1
Jan 13 '17 at 12:16
source share

My little solution for pasting a copy after reading all of this here. tail () does not close $ fp because you have to kill it with Ctrl-C anyway. usleep to save time on your processor while testing only on Windows. You need to put this code in a class!

 /** * @param $pathname */ private function tail($pathname) { $realpath = realpath($pathname); $fp = fopen($realpath, 'r', FALSE); $lastline = ''; fseek($fp, $this->tailonce($pathname, 1, false), SEEK_END); do { $line = fread($fp, 1000); if ($line == $lastline) { usleep(50); } else { $lastline = $line; echo $lastline; } } while ($fp); } /** * @param $pathname * @param $lines * @param bool $echo * @return int */ private function tailonce($pathname, $lines, $echo = true) { $realpath = realpath($pathname); $fp = fopen($realpath, 'r', FALSE); $flines = 0; $a = -1; while ($flines <= $lines) { fseek($fp, $a--, SEEK_END); $char = fread($fp, 1); if ($char == "\n") $flines++; } $out = fread($fp, 1000000); fclose($fp); if ($echo) echo $out; return $a+2; } 
+1
Aug 21 '19 at 12:07 on
source share

Another function, you can use regular expressions to separate elements.

Using
 $last_rows_array = file_get_tail('logfile.log', 100, array( 'regex' => true, // use regex 'separator' => '#\n{2,}#', // separator: at least two newlines 'typical_item_size' => 200, // line length )); 

Function:

 // public domain function file_get_tail( $file, $requested_num = 100, $args = array() ){ // default arg values $regex = true; $separator = null; $typical_item_size = 100; // estimated size $more_size_mul = 1.01; // +1% $max_more_size = 4000; extract( $args ); if( $separator === null ) $separator = $regex ? '#\n+#' : "\n"; if( is_string( $file )) $f = fopen( $file, 'rb'); else if( is_resource( $file ) && in_array( get_resource_type( $file ), array('file', 'stream'), true )) $f = $file; else throw new \Exception( __METHOD__.': file must be either filename or a file or stream resource'); // get file size fseek( $f, 0, SEEK_END ); $fsize = ftell( $f ); $fpos = $fsize; $bytes_read = 0; $all_items = array(); // array of array $all_item_num = 0; $remaining_num = $requested_num; $last_junk = ''; while( true ){ // calc size and position of next chunk to read $size = $remaining_num * $typical_item_size - strlen( $last_junk ); // reading a bit more can't hurt $size += (int)min( $size * $more_size_mul, $max_more_size ); if( $size < 1 ) $size = 1; // set and fix read position $fpos = $fpos - $size; if( $fpos < 0 ){ $size -= -$fpos; $fpos = 0; } // read chunk + add junk from prev iteration fseek( $f, $fpos, SEEK_SET ); $chunk = fread( $f, $size ); if( strlen( $chunk ) !== $size ) throw new \Exception( __METHOD__.": read error?"); $bytes_read += strlen( $chunk ); $chunk .= $last_junk; // chunk -> items, with at least one element $items = $regex ? preg_split( $separator, $chunk ) : explode( $separator, $chunk ); // first item is probably cut in half, use it in next iteration ("junk") instead // also skip very first '' item if( $fpos > 0 || $items[0] === ''){ $last_junk = $items[0]; unset( $items[0] ); } // โ€ฆ else noop, because this is the last iteration // ignore last empty item. end( empty [] ) === false if( end( $items ) === '') array_pop( $items ); // if we got items, push them $num = count( $items ); if( $num > 0 ){ $remaining_num -= $num; // if we read too much, use only needed items if( $remaining_num < 0 ) $items = array_slice( $items, - $remaining_num ); // don't fix $remaining_num, we will exit anyway $all_items[] = array_reverse( $items ); $all_item_num += $num; } // are we ready? if( $fpos === 0 || $remaining_num <= 0 ) break; // calculate a better estimate if( $all_item_num > 0 ) $typical_item_size = (int)max( 1, round( $bytes_read / $all_item_num )); } fclose( $f ); //tr( $all_items ); return call_user_func_array('array_merge', $all_items ); } 
0
Jan 11 '18 at
source share

I like the following method, but it will not work with files up to 2 GB.

 <?php function lastLines($file, $lines) { $size = filesize($file); $fd=fopen($file, 'r+'); $pos = $size; $n=0; while ( $n < $lines+1 && $pos > 0) { fseek($fd, $pos); $a = fread($fd, 1); if ($a === "\n") { ++$n; }; $pos--; } $ret = array(); for ($i=0; $i<$lines; $i++) { array_push($ret, fgets($fd)); } return $ret; } print_r(lastLines('hola.php', 4)); ?> 
0
Feb 20 '19 at 8:40
source share



All Articles