How to parse an excel file that will give me the data exactly the same as visually?

I'm on Rails 5 (Ruby 2.4). I want to read a .xls document, and I would like to receive the data in CSV format, as in the Excel file. Someone recommended using Roo, and so I have

book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
arr_of_arrs = CSV.parse(text)

However, what is returned does not match what I see in the spreadsheet. For isntance, the cell in the spreadsheet has

16:45.81

and when I get the CSV data from above, it returns

"0.011641319444444444"

How to parse an Excel document and get exactly what I see? I don't care if I use Roo for parsing or not, as long as I can get the CSV data, which is a representation of what I see, and not some kind of weird internal representation. For reference, the file type I parsed givies when I run "file name_of_file.xls" ...

Composite Document File V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Author: Dwight Schroot, Last Saved By: Dwight Schroot, Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Sep 21 17:05:21 2010, Last Saved Time/Date: Wed Oct 13 16:52:14 2010, Security: 0
+6
4

.xls. .xls , , , . , =TEXT(A2, "mm:ss.0") A2 - , .

enter image description here

book = ::Roo::Spreadsheet.open(file_location)
puts book.cell('B', 2) 
=> '16.45.8' 

, CSV.new() .

require 'roo-xls'
require 'csv'

CSV::Converters[:time_parser] = lambda do |field, info| 
  case info[:header].strip
  when "time" then  begin 
                      # 0.011641319444444444 * 24 hours * 3600 seconds = 1005.81 
                      parse_time =  field.to_f * 24 * 3600
                      # 1005.81.divmod(60) = [16, 45.809999999999999945]
                      mm, ss = parse_time.divmod(60)
                      # returns "16:45.81"
                      time = "#{mm}:#{ss.round(2)}"  
                      time 
                    rescue
                      field 
                    end
  else 
    field  
  end
end

book = ::Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
csv = CSV.new(sheet.to_csv, headers: true, converters: [:time_parser]).map {|row| row.to_hash}
puts csv 
=> {"time "=>"16:45.81"}
   {"time "=>"12:46.0"}
+2

roo-xls gem , xls. , , , . xls 16:45.81 Number . , , .

mm:ss.0, , , .

+1

. , :

arr_of_arrs = CSV.parse(text, {converters: :date_time})

http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html

0
source

Your problem seems to be related to how you parse (read) the input file.

rooanalyzes only Excel 2007-2013 ( .xlsx) files . From your question, you want to parse .xls, which is different from the format.

As in the documentation, use roo-xlsgem instead.

0
source

Source: https://habr.com/ru/post/1016121/


All Articles