TSV rules are actually slightly different from CSV rules. The main difference is that CSV has provisions for inserting a comma inside the field, and then using quotation characters and escaping quotes inside the field. I wrote a quick example to show how a simple answer fails:
require 'csv' line = 'boogie\ttime\tis "now"' begin line = CSV.parse_line(line, col_sep: "\t") puts "parsed correctly" rescue CSV::MalformedCSVError puts "failed to parse line" end begin line = CSV.parse_line(line, col_sep: "\t", quote_char: "Ζ") puts "parsed correctly with random quote char" rescue CSV::MalformedCSVError puts "failed to parse line with random quote char" end
If you want to use the CSV library, you can use a random quote that you don't expect to see if your file (an example shows this), but you can also use a simpler methodology like the StrictTsv class shown below to get that the same effect without worrying about field quotes.
# The main parse method is mostly borrowed from a tweet by @JEG2 class StrictTsv attr_reader :filepath def initialize(filepath) @filepath = filepath end def parse open(filepath) do |f| headers = f.gets.strip.split("\t") f.each do |line| fields = Hash[headers.zip(line.split("\t"))] yield fields end end end end
The choice of using a CSV library or something more stringent depends only on who sends the file to you and whether they expect to adhere to the strict TSV standard.
Details on the TSV standard can be found at http://en.wikipedia.org/wiki/Tab-separated_values
mmmries Apr 25 '13 at 15:57 2013-04-25 15:57
source share