Most suitable data structure for CSV table?

I am looking for advice on the most appropriate data structure for storing the CSV (Comma Separated Value) table in memory. It should cover both cases: a table with and without a heading. If the table contains a heading, all fields of all rows are defined by the key-> value pairs, where key is the name from the heading and value is the corresponding field content. If the table does not contain a heading, then the rows are just lists of rows, as well as key-> value pairs with the generated key names (for example, "COL1", "COL2", ... "COLN").

I am looking for the simplest (less code) and most common solution at the same time.

I am thinking of the following subclass, but I doubt that this is the correct / efficient way to implement:

TCSV = class (TObjectList<TDictionary<string, string>>) ... public constructor Create(fileName: string; header: Boolean; encoding: string = ''; delimiter: Char = ';'; quoteChar: Char = '"'); overload; ... end; 

It looks like I need to store keys for each row of fields. What about TDictionary<string, TStringList> ? Would this be the best solution?

+4
source share
6 answers

What about TClientDataset? It seems very easy.

A simple guide to using TClientDataSet as a dataset in memory can be found here.

+5
source

The structure you suggest will mean that you will have a TDictionary instance for each line in your csv file. Essentially duplicating column names for each row. It seems like a waste.

Assuming with a TDictionary<string, TStringList> you should populate each TStringList with values ​​from a single column. This may work, but it will still not be easy to iterate over all columns per row of data.

According to GolezTrol, TClientDataSet comes with the Delphi standard, which is very powerful and as a dataset for use with column data. In addition, although it is a data set, it does not require a database (connection) and is used in many applications precisely for the purpose that you are trying to achieve: a data set in memory.

+3
source

I recommend that you try the TJvCsvDataSet, which I wrote and contributed to the JEDI JVCL. It works with CSV files with and without headers. It works with data controls, including DB Grids.

It analyzes CSV data and works completely like a client dataset others have suggested.

Inside, it uses an array of byte records and parses each row and saves an integer β€œsearch” so that it knows where each individual column begins with that particular row. This makes changing one value for another value (changing a field in a row) a very quick operation.

It supports most common field types (though not blob or currency right now), and it analyzes CSV functions, including built-in carriage returns + line feeds that are inside the field value, and embedded CSV "escape codes" so you can put double enter a character inside a string, for example.

It has the FieldDef property, which can be used to determine the types of columns, or simply can read the file header and treat each value internally as a string (unless you say otherwise).

It can modify the CSV by adding or removing columns, and do most of the common things you want to do with a CSV table. I used it and tested it heavily and it works great.

+3
source

Instead of using TDataSet, you can also use Synopse TSynBigTable , which is more permanent and has less restrictions.

Without any critical application, the time or size of the TDataSet is fine.

+1
source

So, you basically want to have access to elements such as:

 for RowNum := 0 to csv.Count - 1 do begin Name := csv[RowNum]['Name']; // Do something end; 

TObjectList<TDictionary<string, string>> probably do the job, but its not very efficient.

Loading csv into Dataset is likely to be the least code, but will have a bit more overhead.

You can consider combining a simple Tstringlist or TList<string> for the header and break the data into a new class that accepts the header in its constructor. You will get the same result:

 TCSVRow = class private FHeaders: TList<string>; FFields: TList<string>; public constructor(Headers: TList<string>); function GetField(index: string): string; property Fields[index: string]: string read GetField; default; end; TCSV = class private FHeaders: TList<string>; FRows:TList<TCSVRow>; public function GetRow(Index: integer):TCSVRow; property Rows[index: integer]:TCSVRow read GetRow; default; end; implementation function TCSVRow.GetField(index: string): string; begin Result := FFields[FHeaders.IndexOf(index)]; end; function TCSV.GetRow(Index:integer):TCSVRow; begin Result := FRows[Index]; end; 

This is incomplete, and I typed it directly in the browser, so I did not check it for correctness, but you got a general idea. Thus, the header information is stored only once, and is not duplicated for each row.

You can save a small bit of memory by making FFields string array instead of TList<string> , but TList<string> easier to work with IMHO.

Update David has a second point. The CSVRow class can be eliminated. You could just have either a TList<TList<string>> or a 2d array. In any case, I still think you should keep the headers on a separate list. In this case, the TCSV will be more like:

 TCSV = class private FHeaders: TList<string>; FData:TList<TList<string>>; public function GetData(Row: integer; Column:string):string; property Data[Row: integer; Column:string]:string read GetData; default; end; function TCSV.GetData(Row: integer; Column:string):string; begin Result := FData[Row][FHeaders.IndexOf(Column)]; end; 
0
source

There are many possible solutions. If you want something really simple and universal according to your request (not necessarily the most bizarre solution), why not just ...

 TMyRec = record HeaderNames: array of string; StringValues: array of array of string end; 

Just set the length of the arrays as needed (using SetLength).

0
source

Source: https://habr.com/ru/post/1381949/


All Articles