How can I cut large csv files using any R packages like ff or data.table?

I want to cut out large csv files (file size larger than RAM size) and use them or save each to disk for later use. Which R package is best for large files?

+4
source share
4 answers

To read a large file, you had to use the read.csv.ffdf of the ff package with these specific parameters:

library(ff)
a <- read.csv.ffdf(file="big.csv", header=TRUE, VERBOSE=TRUE, first.rows=1000000, next.rows=1000000, colClasses=NA)

When a large file is read into an ff object, a subset of ffobject into data frames can be executed using: a [1000: +1000000,]

  totalrows = dim (a) [1]   row.size = as.integer(object.size(a [1:10000,]))/10000 #in bytes

block.size = 200000000  #in bytes .IN Mbs 200 Mb

#rows.block is rows per block
rows.block = ceiling(block.size/row.size)

#nmaps is the number of chunks/maps of big dataframe(ff), nmaps = number of maps - 1
nmaps = floor(totalrows/rows.block)


for(i in (0:nmaps)){
  if(i==nmaps){
    df = a[(i*rows.block+1) : totalrows,]
  }
  else{
    df = a[(i*rows.block+1) : ((i+1)*rows.block),]
  }
  #process df or save it
  write.csv(df,paste0("M",i+1,".csv"))
  #remove df
  rm(df)
}
+3

, skip nrows read.table read.csv . ?read.table

: , .

nrows integer: . .

, , . , , , , csv.

p.s. , header=TRUE , .

+3

, bu @berkorbay, , , . , , , .

- , , , script PERL, chuncks, . . , , :

#!/usr/bin/perl
system("cls");
print("Fragment .csv file keeping header in each chunk\n") ;

print("\nEnter input file name  = ") ;
$entrada = <STDIN> ;
print("\nEnter maximum number of lines in each fragment = ") ;
$nlineas = <STDIN> ;
print("\nEnter output file name stem   = ") ;
$salida = <STDIN> ;
chop($salida) ;
open(IN,$entrada)    || die "Cannot open input file: $!\n" ;

$cabecera  = <IN> ;
$leidas    = 0  ;
$fragmento = 1  ;
$fichero   = $salida.$fragmento ;
open(OUT,">$fichero") || die "Cannot open output file: $!\n" ;
print OUT $cabecera ;
while(<IN>) {
    if ($leidas > $nlineas) {
    close(OUT) ;
    $fragmento++ ;
    $fichero   = $salida.$fragmento ;
    open(OUT,">$fichero") || die "Cannot open output file: $!\n" ;
    print OUT $cabecera ;
    $leidas = 0;
    }
    $leidas++ ;
    print OUT $_ ;
}
close(OUT) ;

. , , , PERL (, Windows, script "perl name-of-script" ).

+3

mysql, dbWriteTable, read.dbi.ffdf ETLUtils, R. ;

read.csv.sql.ffdf <- function(file, name,overwrite = TRUE, header = TRUE, drv = MySQL(), dbname = "new", username = "root",host='localhost', password = "1234"){
  conn = dbConnect(drv, user = username, password = password, host = host, dbname = dbname)
  dbWriteTable(conn, name, file, header = header, overwrite = overwrite)
  on.exit(dbRemoveTable(conn, name))
  command = paste0("select * from ", name)
  ret = read.dbi.ffdf(command, dbConnect.args = list(drv =drv, dbname = dbname, username = username, password = password))
  return(ret)
}
Run codeHide result
0
source

Source: https://habr.com/ru/post/1542444/


All Articles