Reading the last n lines from a huge text file

I tried something like this

file_in <- file("myfile.log","r") x <- readLines(file_in, n=-100) 

but I'm still waiting ...

Any help would be greatly appreciated.

+6
source share
4 answers

I would use scan for this if you know how many lines the log has:

 scan("foo.txt",sep="\n",what="char(0)",skip=100) 

If you don’t know how much you need to skip, you have no choice but to go to

  • reading in everything and taking the last n lines (if possible),
  • using scan("foo.txt",sep="\n",what=list(NULL)) to find out how many entries there are, or
  • using some algorithm to go through the file, saving only the last n lines each time

The last option might look like this:

 ReadLastLines <- function(x,n,...){ con <- file(x) open(con) out <- scan(con,n,what="char(0)",sep="\n",quiet=TRUE,...) while(TRUE){ tmp <- scan(con,1,what="char(0)",sep="\n",quiet=TRUE) if(length(tmp)==0) {close(con) ; break } out <- c(out[-1],tmp) } out } 

allowing:

 ReadLastLines("foo.txt",100) 

or

 ReadLastLines("foo.txt",100,skip=1e+7) 

if you know that you have over 10 million rows. This can save reading time when you start to have very large magazines.


EDIT: Actually, I didn't even use R for this, given the size of your file. On Unix, you can use the tail command. There is a version of Windows for this, somewhere in the toolbox. I have not tried to do this yet.

+9
source

You can do this with read.table by specifying the skip parameter. If your lines are not processed by variables, specify the delimiter as '\n' , as @Joris Meys below, and set as.is=TRUE to get character vectors instead of factors.

A small example (skipping the first lines of 2000):

 df <- read.table('foo.txt', sep='\n', as.is=TRUE, skip=2000) 
+4
source

As @JorisMeys already mentioned, the unix tail command will be the easiest way to solve this problem. However, I want to offer seek based R solution that will start reading the file from the end of the file:

 tailfile <- function(file, n) { bufferSize <- 1024L size <- file.info(file)$size if (size < bufferSize) { bufferSize <- size } pos <- size - bufferSize text <- character() k <- 0L f <- file(file, "rb") on.exit(close(f)) while(TRUE) { seek(f, where=pos) chars <- readChar(f, nchars=bufferSize) k <- k + length(gregexpr(pattern="\\n", text=chars)[[1L]]) text <- paste0(text, chars) if (k > n || pos == 0L) { break } pos <- max(pos-bufferSize, 0L) } tail(strsplit(text, "\\n")[[1L]], n) } tailfile(file, n=100) 
0
source

To view the last few lines:

 tail(file_in,100) 
0
source

Source: https://habr.com/ru/post/885513/


All Articles