... so leads a wise idea ...?
Since you are doing this as an exercise for learning, you may want to go deeper into lexing and parsing . Your current approach will show your flaws pretty quickly, as described in Stop Rolling Your Own CSV Parser! . It is not that parsing CSV data is difficult. (This is not the case.) It is just that most CSV parser projects see the problem as a text separation problem compared to a parsing problem. If you take the time to define the CSV language, the parser almost writes itself.
RFC 4180 defines the grammar of CSV data in ABNF :
file = [header CRLF] record *(CRLF record) [CRLF] header = name *(COMMA name) record = field *(COMMA field) name = field field = (escaped / non-escaped) escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE non-escaped = *TEXTDATA COMMA = %x2C CR = %x0D ;as per section 6.1 of RFC 2234 DQUOTE = %x22 ;as per section 6.1 of RFC 2234 LF = %x0A ;as per section 6.1 of RFC 2234 CRLF = CR LF ;as per section 6.1 of RFC 2234 TEXTDATA = %x20-21 / %x23-2B / %x2D-7E
This grammar shows how individual characters are created to create more complex language elements. (As written, definitions go in the opposite direction from complex to simple.)
If you start with grammar, you can write parsing functions that reflect non-terminal grammar elements (lower case elements). Julian M. Bucknall describes the process in Writing a Parser for CSV Data . Take a look at Test-Driven Development with ANTLR for an example of the same process using a parser generator.
Keep in mind that there is no accepted definition of CSV. Wildlife CSV data does not guarantee the implementation of all RFC 4180 offers.
source share