Why does open ("url") sometimes return File sometimes StringIO?

I have two CSV files stored on S3. When I open one of them, File returns. When I open another, StringIO returns.

 fn1 #=> "http://SOMEWHERE.s3.amazonaws.com/setup_data/d1/file1.csv" open(fn1) #=> #<File:/var/folders/sm/k7kyd0ns4k9bhfy7yqpjl2mh0000gn/T/open-uri20140814-26070-11cyjn1> fn2 #=> "http://SOMEWHERE.s3.amazonaws.com/setup_data/d2/d3/file2.csv" open(fn2) #=> #<StringIO:0x007f9718670ff0> 

Why? Is there a way to open them using a consistent data type?

I need to pass the same String data type to CSV.read(open(file_url)) , which does not work if sometimes it receives File , and sometimes StringIO .

They were created using different ruby ​​scripts (they contain very different data).

On my Mac, they both look like regular CSV text files, and they were expanded through the AWS console and have the same permissions and identical metadata (content-type: application / octet-stream).

+6
source share
2 answers

CSV::read expects the file path as its argument, not the already open I / O object. He will then open the file and read the contents. Your code works for the Tempfile case, because Ruby invokes to_path behind the scenes on anything passed to File::open , and File responds to this method. What happens, CSV opens another IO in the same file.

Instead of using CSV::read you can create a new CSV object and call read on it (an instance method, not a class method). CSV:new handles input / output objects correctly:

 CSV.new(open(file_url)).read 
+1
source

This is by design. A temporary file is created if the size of the object exceeds 10,240 bytes. From source:

 StringMax = 10240 def <<(str) @io << str @size += str.length if StringIO === @io && StringMax < @size require 'tempfile' io = Tempfile.new('open-uri') io.binmode Meta.init io, @io if Meta === @io io << @io.string @io = io end end 

If you need a StringIO object, you can use fastercsv .

+6
source

Source: https://habr.com/ru/post/973855/


All Articles