Quote_char causes interference in Ruby CSV import

I have a simple CSV file that uses | (pipe) as a quote character. After updating my rails application from Ruby 1.9.2 to 1.9.3, I get the error message "CSV :: MalformedCSVError: Missing or obscene quote on line 1".

If I open vim and replace | with regular quotes, single quotes, or even "=", the file works fine, but | and * lead to an error. Does anyone have any thoughts on what might be causing this? Here is a simple single line that can reproduce the error:

@csv = CSV.read("public/sample_file.csv", {quote_char: '|', headers: false}) 

This is also reproduced in Ruby 2.0, as well as in irb boot rails without load.

Edit: here are some sample lines from CSV

 |076N102 |,|CARD |,| 1|,|NEW|,|PCS | |07-1801 |,|BASE |,| 18|,|NEW|,|PCS | 
+4
source share
1 answer

I think you just found an error in the ruby ​​CSV module. From csv.rb:

 1587: @re_chars = /#{%"[-][\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding)}/ 

This regular text is used to remove characters that conflict with special regular expression characters, including the pipe char | . I see no reason to add [-] , so if you delete it, your example will start working:

edit: a hyphen should be escaped inside a character set expression (surrounded by brackets [] ) only when it is not a leading character. Therefore, I had to update the fixed Regexp:

 1587: @re_chars = /#{%"(?<!\\[)-(?=.*\\])|[\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding)}/ CSV.read('sample.csv', {quote_char: '|'}) # [["076N102 ", # "CARD ", # " 1", "NEW", "PCS "], # ["07-1801 ", # "BASE ", # " 18", "NEW", "PCS "]] 

Since most languages ​​do not support lookbehind expressions with quantifiers included by Ruby, I had to write it as a negative version for the left bracket. This would also correspond to hyphens with the missing left side of the pair. If you find the best solution, leave a comment pls.

Nice to hear any comments before filling out the bug report at ruby-lang.org.

+6
source

Source: https://habr.com/ru/post/1480365/


All Articles