Skip all leading blank lines in read.csv

I want to import CSV files into R, with the first non-empty line supplying the column names of the data frame. I know that you can provide the skip = 0 argument to indicate which line to read first. However, the line number of the first non-empty line may vary between files.

How to determine how many lines are empty and skip them dynamically for each file?

As stated in the comments, I need to clarify what “empty” means. My csv files look like this:

 ,,, w,x,y,z a,b,5,c a,b,5,c a,b,5,c a,b,4,c a,b,4,c a,b,4,c 

which means there are commas at the beginning of the line.

+6
source share
3 answers

read.csv automatically skips empty lines (unless you set blank.lines.skip=FALSE ). See ?read.csv

After writing above, the poster explained that the empty lines are actually empty, but they have commas, but there is nothing between the commas. In this case, use fread from the data.table package that will handle this. The skip= argument can be set to any character string found in the header:

 library(data.table) DT <- fread("myfile.csv", skip = "w") # assuming w is in the header DF <- as.data.frame(DT) 

The last row may be omitted if the data table is in order, as the return value.

+7
source

Depending on your file size, this may not be the best solution, but will do the job.

The strategy here, instead of reading a delimited file, will read as lines, and count characters and store at a pace. Then, while the loop will look for the first nonzero character length in the list, then it will read the file and save it as data_filename.

 flist = list.files() for (onefile in flist) { temp = nchar(readLines(onefile)) i = 1 while (temp[i] == 0) { i = i + 1 } temp = read.table(onefile, sep = ",", skip = (i-1)) assign(paste0(data, onefile), temp) } 

If the file contains headers, you can run I from 2.

+2
source

If the first pair of empty lines is really empty, then read.csv should automatically jump to the first line. If they have commas but no values, you can use:

 df = read.csv(file = 'd.csv') df = read.csv(file = 'd.csv',skip = as.numeric(rownames(df[which(df[,1]!=''),])[1])) 

This is inefficient if you have large files (since you have to import twice), but it works.

If you want to import a tab delimited file with the same problem (variable empty lines), use:

 df = read.table(file = 'd.txt',sep='\t') df = read.table(file = 'd.txt',skip = as.numeric(rownames(df[which(df[,1]!=''),])[1])) 
+2
source

Source: https://habr.com/ru/post/976952/


All Articles