MATLAB: using textscan and converting an array of cells into a matrix

I have a large csv file (there should be about 1 million lines) with parameter data with the following structure (content changed):

secid, date, days, delta, impl_volatility, impl_strike, impl_premium, dispersion, cp_flag, ticker, index_flag, industry_group 100000, 02/05/1986, 60, -80, 0.270556, 74.2511, 5.2415, 0.021514, C, ASC, 0, 481 100000, 03/05/1986, 30, -40, 0.251556, 74.2571, 6.2415, 0.025524, P, ASC, 0, 481 

I successfully imported the test file using the following:

 ftest = fopen('test.csv'); C = textscan(ftest,'%f %s %f %f %f %f %f %f %s %s %f %f','Headerlines',1,'Delimiter',','); fclose(ftest); 

However, C is an array of cells, and this makes it difficult to process the contents of the file in matlab. It would be easier to have it as a "regular" array (forgive me for not knowing the correct nomenclature, I just started working with Matlab).

If I go out in C, I get:

 Columns 1 through 6 [2x1 double] {2x1 cell} [2x1 double] [2x1 double] [2x1 double] [2x1 double] Columns 7 through 12 [2x1 double] [2x1 double] {2x1 cell} {2x1 cell} [2x1 double] [2x1 double] 

Thus, inside the cell array, which is C, there are arrays and arrays of cells - arrays for numbers and arrays of cells for strings. If I try to check element (1,2), I have to use C {1} (2), but if I want to check element (2,2), I have to use C {2} {2}. Ideally, I would like to access both C (1,2) and C (2,2). The question is, how do I do this?

I searched for solutions and found cells2mat, but it only works if all the contents are numeric (I think). I found this solution: Converting a cell array of cell arrays into a matrix matrix , but horzcat is throwing an error, which I suppose could be due to the same problem.

Thank you in advance for your time.

+4
source share
2 answers

Since you have an array containing both numeric and character data, what you want is impossible (and, believe me, this would also be impractical).

The reference to individual numbers in a numeric array is different from the reference to whole lines. It just does not escape this, and it should not be: you relate to flowers differently than you relate to people (I hope, one way or another).

In MATLAB, a string is a regular array, with the difference that each record in the array is not a number, but a character. The reference to individual characters matches the reference numbers in arrays:

 >> a = 'my string' >> a(4) ans = s >> a+0 % cast to double to show the "true character" of strings ans = 109 121 32 115 116 114 105 110 103 

However, textscan assumes (rightfully) that you do not want to do this, but rather want to extract entire lines from a file. And whole lines should be referenced differently to indicate that you mean whole lines, not individual characters.

I think you will find it all more intuitive if you split the results from textscan into a regular numeric array and an array of row cells, for example:

 % load the data ftest = fopen('test.csv'); C = textscan(ftest,... '%f %s %f %f %f %f %f %f %s %s %f %f',... 'collectoutput', true,... 'Delimiter',',\n'); fclose(ftest); % split into numeric and char arrays numeric = [C{[1 3 5]}] alpha = [C{[2 4]}] 

The data reference in numeric then follows the same rules as any regular array, and referring to strings in alpha , then follows the normal cell binding rules (as in alpha{2,1} to get '03/05/1986' )

EDIT Based on your comments, you want to do the following:

 % Read the data fid = fopen('test.csv', 'r'); C = textscan(fid,... '%f %s %f %f %f %f %f %f %s %s %f %f',... 'Headerlines', 1,... 'Delimiter',','); fclose(fid); % Delete 10th element ('ASC') C(10) = []; % Mass-convert dates to datenums C{2} = datenum(C{2}, 'dd/mm/yyyy'); % Map 'P' to 1 and 'C' to 2 map('PC') = [1 2]; C{9} = map([C{9}{:}]).'; % Convert whole array to numeric array C = [C{:}]; 
+5
source

I had the same problem ... I would rather have an array of two-dimensional cells for easy accessibility and use the matlab built-in sorting functions.

Here is another solution that might work for you (as TMW does in its auto-generated code in the import tool). It turns numeric arrays into cell arrays so you can combine them into a two-dimensional matrix.

 C([1,3,4,5,6,7,8,11,12]) = cellfun(@(x) num2cell(x), C([1,3,4,5,6,7,8,11,12]),'UniformOutput', false); C = [C{1:end}]; 
0
source

Source: https://habr.com/ru/post/1491430/


All Articles