How to skip extra lines before the header of a tab delimited delimiter file in R

The software I use creates log files with a variable number of lines of summary information, followed by a lot of tab delimited data. I am trying to write a function that will read data from these log files into a data frame, ignoring the resulting information. The summary information never contains a tab, so the following function works:

read.parameters <- function(file.name, ...){
  lines <- scan(file.name, what="character", sep="\n")
  first.line <- min(grep("\\t", lines))
  return(read.delim(file.name, skip=first.line-1, ...))
}

However, these logfiles are quite large, so reading the file twice is very slow. Surely there is a better way?

Edited to add:

Marek suggested using the object textConnection. The method he proposed in response fails in a large file, but the following works:

read.parameters <- function(file.name, ...){
  conn = file(file.name, "r")
  on.exit(close(conn))
  repeat{
    line = readLines(conn, 1)
    if (length(grep("\\t", line))) {
      pushBack(line, conn)
      break}}
  df <- read.delim(conn, ...)
  return(df)}

: Marek .

+3
2

. textConnection .

read.parameters <- function(file.name, ...){
  lines <- scan(file.name, what="character", sep="\n") # you got "tmp.log" here, i suppose file.name should be
  first.line <- min(grep("\\t", lines))
  return(read.delim(textConnection(lines), skip=first.line-1, ...))
}
+1

, , N , . N = 200, :

scan (..., nlines = N)

, , N .

0

Source: https://habr.com/ru/post/1750288/


All Articles