Writing and polishing a CSV analyzer

Question

Writing and polishing a CSV analyzer

As part of a recent project, I had to read and write from a CSV file and put the grid in C #. In the end, I decided to use a ready-made built-in parser to do my job.

Because I like to do such things, I wondered how to write my own.

So far, all I have managed to do is:

//Read the header StreamReader reader = new StreamReader(dialog.FileName); string row = reader.ReadLine(); string[] cells = row.Split(','); //Create the columns of the dataGridView for (int i = 0; i < cells.Count() - 1; i++) { DataGridViewTextBoxColumn column = new DataGridViewTextBoxColumn(); column.Name = cells[i]; column.HeaderText = cells[i]; dataGridView1.Columns.Add(column); } //Display the contents of the file while (reader.Peek() != -1) { row = reader.ReadLine(); cells = row.Split(','); dataGridView1.Rows.Add(cells); }

My question is: this is a wise idea, and if it is (or not), how would I test it correctly?

+4

c # parsing

Tom kealy Feb 13 '12 at 19:33

source share

5 answers

Get (or do) some CSV data and write Unit Tests using NUnit or Visual Studio Testing Tools .

Be sure to check the edges, e.g.

 "csv","Data","with","a","trailing","comma",

and

 "csv","Data","with,","commas","and","""quotes""","in","it"

+2

Chris shain Feb 13 '12 at 19:38

source share

... so leads a wise idea ...?

Since you are doing this as an exercise for learning, you may want to go deeper into lexing and parsing . Your current approach will show your flaws pretty quickly, as described in Stop Rolling Your Own CSV Parser! . It is not that parsing CSV data is difficult. (This is not the case.) It is just that most CSV parser projects see the problem as a text separation problem compared to a parsing problem. If you take the time to define the CSV language, the parser almost writes itself.

RFC 4180 defines the grammar of CSV data in ABNF :

 file = [header CRLF] record *(CRLF record) [CRLF] header = name *(COMMA name) record = field *(COMMA field) name = field field = (escaped / non-escaped) escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE non-escaped = *TEXTDATA COMMA = %x2C CR = %x0D ;as per section 6.1 of RFC 2234 DQUOTE = %x22 ;as per section 6.1 of RFC 2234 LF = %x0A ;as per section 6.1 of RFC 2234 CRLF = CR LF ;as per section 6.1 of RFC 2234 TEXTDATA = %x20-21 / %x23-2B / %x2D-7E

This grammar shows how individual characters are created to create more complex language elements. (As written, definitions go in the opposite direction from complex to simple.)

If you start with grammar, you can write parsing functions that reflect non-terminal grammar elements (lower case elements). Julian M. Bucknall describes the process in Writing a Parser for CSV Data . Take a look at Test-Driven Development with ANTLR for an example of the same process using a parser generator.

Keep in mind that there is no accepted definition of CSV. Wildlife CSV data does not guarantee the implementation of all RFC 4180 offers.

+1

Corbin March Feb 13 '12 at 20:56

source share

Parsing a CSV file is not difficult, but it involves more than just calling String.Split() .

You break lines in each comma. But it is possible that the fields contain embedded commas. In these cases, the CSV wraps the field in double quotes. Therefore, you should also look for double quotes and ignore the commas in these quotes. In addition, even fields can contain embedded double quotes. Double quotation marks should appear in double quotation marks and be "doubled" to indicate that the quote is a literal character.

If you want to see how I did this, you can check out this article .

0

Jonathan wood Feb 13 '12 at 21:52

source share

This comes from http://www.gigawebsolution.com/Posts/Details/61/Building-a-Simple-CSV-Parser-in-C#

 public interface ICsvReaderWriter { List<string[]> Read(string filePath, char delimiter); void Write(string filePath, List<string[]> lines, char delimiter); } public class CsvReaderWriter : ICsvReaderWriter { public List<string[]> Read(string filePath, char delimiter) { var fileContent = new List<string[]>(); using (var reader = new StreamReader(filePath, Encoding.Unicode)) { string line; while ((line = reader.ReadLine()) != null) { if (!string.IsNullOrEmpty(line)) { fileContent.Add(line.Split(delimiter)); } } } return fileContent; } public void Write(string filePath, List<string[]> lines, char delimiter) { using (var writer = new StreamWriter(filePath, true, Encoding.Unicode)) { foreach (var line in lines) { var data = line.Aggregate(string.Empty, (current, column) => current + string.Format("{0}{1}", column,delimiter)) .TrimEnd(delimiter); writer.WriteLine(data); } } } }

0

Gigapr Jul 19 '12 at 12:23

source share

Mark wilkins · Accepted Answer · 2012-02-13T19:37:59+0000

As a programming exercise (for learning and gaining experience), this is probably a very reasonable thing. For production code, it might be better to use an existing library mainly because the work is already done. There are quite a few things to call the CSV parser. For example (randomly from the top of the head):

Quoted Values (Strings)
Inline quotation marks in quoted strings
Null values (NULL ... or maybe even NULL versus empty).
Rows without the correct number of entries
Headings versus headings.
Recognition of different types of data (e.g., different date formats).

If you have a very specific input format in a very controlled environment, you may not have to deal with all of these.

Writing and polishing a CSV analyzer

More articles: