How to get FTP entries in Ruby without first saving the text file and submitting a CSV with it

I already get the records from the ftp server using the gettextfile method and work on each record in this block to finally put it in another place.

This file is a CSV file, and I need to continue using CSV to get the headers and data and put it in the DB after some work. Since I have many different files, I need a general way. I do not want to load all the records into memory or disk, because the files can be very large! So the flow will be good

One idea is to provide an io object for CSV, but I don't see how to do this with Net :: FTP.

I already see "http://stackoverflow.com/questions/5223763/how-to-ftp-in-ruby-without-first-saving-the-text-file", but it works with PUT.

Any help?

+4
source share
3 answers

The technique Justin mentions creates a temporary file .

You can use retrlines :

filedata = '' ftp.retrlines("RETR " + filename) do |block| filedata << block end 

or retrbinary instead:

  filedata = '' ftp.retrbinary("RETR " + filename, Net::FTP::DEFAULT_BLOCKSIZE) do |block| filedata << block end 
+2
source

I think that you are most on the way to solving with gettextfile. You can simply copy part of the file to Array and then process it with CSV when it reaches a certain threshold. Here are some unverified codes that process ten lines at a time:

 current_line = 0 chunk = [] ftp.gettextfile('file.csv') do |line| chunk << line process_chunk!(chunk) if current_line % 10 == 0 current_line += 1 end process_chunk!(chunk) # Any remaining lines in final partial chunk def process_chunk!(lines_in_chunk) # process partial chunk of lines as if it were the whole file lines_in_chunk = [] end 

This seems like one of the easiest solutions for me, but you could probably work on several unix processes (writing and reading from STDOUT) or Ruby threads in a producer-consumer model.

+1
source

The solution I came up with uses IO.pipe , a stream to iterate over lines of text from an FTP file (some of which may be fragments of lines inside quotation marks) and puts each line to write I / O.

In the main thread, I create a CSV instance based on an IO reader and iterate over from the parsed lines.

 require 'CSV' def stream_ftp_csv_test(ftp, filename) read_io, write_io = IO.pipe fetcher = Thread.new do begin ftp.gettextfile filename do |line| write_io.puts line end ensure write_io.close end end csv = CSV.new(read_io, headers: :first_row) csv.each do |row| # Printing the row hashes here as an example. # You could yield each one to a given block # argument or whatever else makes sense. p row.to_h end fetcher.join ensure read_io.close if read_io end 
0
source

Source: https://habr.com/ru/post/1446814/


All Articles