Predefined array of cells in matlab

It is more a matter of understanding behavior, not a specific problem.

Mathworks claims that numerical data is kept continuous, which makes pre-allocation important. This does not apply to cell arrays.

Are they something similar to a vector or an array of pointers in C ++?

This would mean that the preposition is not so important, since the pointer is half the size of the double (according to whos - but somewhere there is an add-in somewhere to store the mxArray data type).

Running this code:

clear all n = 1e6; tic A = []; for i=1:n A(end + 1) = 1; end fprintf('Numerical without preallocation %fs\n',toc) clear A tic A = zeros(1,n); for i=1:n A(i) = 1; end fprintf('Numerical with preallocation %fs\n',toc) clear A tic A = cell(0); for i=1:n A{end + 1} = 1; end fprintf('Cell without preallocation %fs\n',toc) tic A = cell(1,n); for i=1:n A{i} = 1; end fprintf('Cell with preallocation %fs\n',toc) 

returns: Numerical without preliminary distribution 0.429240 s Numerical with preliminary distribution 0.025236 s Cell without pre-installation 4.960297 s Cell with preliminary distribution 0.554257 s

There are no surprises for numerical values. But this surprised me, since only the container of pointers, and not the data itself, would need to be redistributed. Which should (since the pointer is less than double) leads to a difference of <.2s. Where does this overhead come from?

A related question would be if I wanted to create a data container for heterogeneous data in Matlab (preallocation is not possible, since the final size is unknown at the beginning). I believe the processing classes are not very good, since they also have huge overheads.

already look forward to learning something

mage _

Edit: I tried the linked list suggested by Eitan T, but I think the overhead from MATLAB is still pretty big. I tried something with a double array as data (rand (200000,1)).

I made a little plot to illustrate: enter image description here

code for the graph: (I used the dlnode class from mlabab hompage as indicated in the answer)

D = rand (200000.1);

 s = linspace(10,20000,50); nC = zeros(50,1); nL = zeros(50,1); for i = 1:50 a = cell(0); tic for ii = 1:s(i) a{end + 1} = D; end nC(i) = toc; a = list([]); tic for ii = 1:s(i) a.insertAfter(list(D)); end nL(i) = toc; end figure plot(s,nC,'r',s,nL,'g') xlabel('#iter') ylabel('time (s)') legend({'cell' 'list'}) 

Do not misunderstand me. I like the idea of ​​a linked list as they are quite flexible, but I think the overhead can be big.

+6
source share
1 answer

Are cells arrays something similar to a vector or an array of pointers in C ++?

Array cells allow you to store data of different types and sizes, but each cell also adds a constant overhead of 112 bytes (see this other answer ). This is much more than an 8-byte double, and it does not matter, especially when working with arrays of large cells, as in your example.

It is reasonable to assume that the array of cells is implemented as a continuous array of pointers, each of which indicates the actual contents of the cell.

This means that you can change the contents of each cell separately without changing the size of the container of the cell array itself. However, this also means that adding new cells to the cell array requires dynamic storage allocation, and therefore preallocating memory for the cell array improves performance.

A related question would be if I would like to create a data container for heterogeneous data in Matlab (preallocation is not possible, since the final size is unknown at the beginning)

Ignorance of the final size can indeed be a problem, but you can always pre-allocate an array of cells with the maximum supported size (if any) and delete empty cells at the end. I also suggest you study the implementation of linked lists in MATLAB .

+9
source

Source: https://habr.com/ru/post/950249/


All Articles