tl; dr: You should probably use a one-dimensional approach.
Note. Itโs impossible to figure out the details that affect performance when comparing dynamic 1d or dynamic 2d storage templates without filling out books, because code performance depends on a very large number of parameters. Profile, if possible.
1. What is faster?
For dense matrices, the 1D approach is likely to be faster because it offers better memory locality and lower distribution and free costs.
2. Which is less?
Dynamic-1D consumes less memory than the 2D approach. The latter also requires more appropriations.
Notes
I have outlined a rather long answer below for several reasons, but first I want to make some comments about your assumptions.
I can imagine that recalculating indices for 1D arrays (y + x * n) can be slower than using a 2D array (x, y)
Compare these two functions:
int get_2d (int **p, int r, int c) { return p[r][c]; } int get_1d (int *p, int r, int c) { return p[c + C*r]; }
The assembly (not built-in) generated by Visual Studio 2015 RC for these functions (with optimizations enabled):
?get_1d@@YAHPAHII@Z PROC push ebp mov ebp, esp mov eax, DWORD PTR _c$[ebp] lea eax, DWORD PTR [eax+edx*4] mov eax, DWORD PTR [ecx+eax*4] pop ebp ret 0 ?get_2d@@YAHPAPAHII@Z PROC push ebp mov ebp, esp mov ecx, DWORD PTR [ecx+edx*4] mov eax, DWORD PTR _c$[ebp] mov eax, DWORD PTR [ecx+eax*4] pop ebp ret 0
The difference is mov (2d) vs. lea (1d). The former has a delay of 3 cycles and a maximum throughput of 2 per cycle, while the latter has a latency of 2 cycles and a maximum throughput of 3 per cycle. (According to the Instruction Table - Agner Fog Since the differences are not significant, I think there should not be a big difference in performance related to recalculating the index. I expect that it is very unlikely that this difference in itself would be a bottleneck in any program.
This leads us to the following (and more interesting) point:
... but I could imagine that 1D could be in the CPU cache ...
True, but 2d may also be in the CPU cache. See The Downsides: Memory Location for an explanation of why 1d is still better.
A long answer or why a dynamic 2-dimensional data store (pointer to a pointer or a vector-vector) is "bad" for simple / small matrices.
Note. This applies to dynamic arrays / distribution schemes [malloc / new / vector, etc.]. A static 2d array is a contiguous block of memory and therefore not subject to the minuses that I am going to introduce here.
Problem
To understand why a dynamic array of dynamic arrays or a vector of vectors is most likely not a template for storing data of your choice, you need to understand the memory location of such structures.
Pointer Syntax Pointer Example
int main (void) { // allocate memory for 4x4 integers; quick & dirty int ** p = new int*[4]; for (size_t i=0; i<4; ++i) p[i] = new int[4]; // do some stuff here, using p[x][y] // deallocate memory for (size_t i=0; i<4; ++i) delete[] p[i]; delete[] p; }
disadvantages
Memory location
For this โmatrixโ you select one block of four pointers and four blocks of four integers. All distributions are not connected with each other and therefore can lead to an arbitrary memory position.
The following image will give you an idea of โโwhat memory might look like.
For a real 2d case:
- The purple square is the memory position occupied by
p itself. - Green squares collect the region of memory
p points to (4 x int* ). - 4 areas of 4 adjacent blue squares are those that each
int* green area points to
For 2d plotted on the 1st case:
- The green square is the only required pointer
int * - Blue squares unite the memory area for all matrix elements (16 x
int ).

This means that (when using the left layout) you are likely to notice worse performance than for a continuous storage template (as seen on the right) due to caching, for example.
Suppose a cache line is โthe amount of data sent to the cache at a time,โ and imagine that a program accesses the entire matrix one element after another.
If you have a correctly aligned 4-fold 4-bit matrix of 32-bit values, a processor with a 64-byte cache line (typical value) is capable of "one-time" data (4 * 4 * 4 = 64 bytes). If you start processing, and the data is not yet in the cache, you will encounter a cache skip, and the data will be extracted from the main memory. This load can retrieve the entire matrix at once, because it fits into the cache line if and only if it is adjacent (and correctly aligned). There will probably be no more misses in processing this data.
In the case of a dynamic, "real two-dimensional" system with unrelated locations of each row / column, the processor must load each memory cell separately. Although only 64 bytes are required, loading 4 cache lines for 4 unlinked memory positions will in the worst case actually transfer 256 bytes and spend 75% of the bandwidth. If you process data using a 2d scheme, you again (if not cached) are faced with skipping the cache on the first element. But now only the first line / colum will be in the cache after the first load from the main memory, because all the other lines are located somewhere else in the memory, and not next to the first. As soon as you reach a new row / column, the cache will be skipped again and the next load from main memory will be executed.
In short: a 2d template has a higher chance of cache miss with the 1st scheme offering better performance potential due to data locality.
Private distribution / exemption
- To create the desired NxM (4 ร 4) matrix, integers are needed, such as
N + 1 (4 + 1 = 5) distributions (using either new, malloc, allocator :: allocate, or any other). - You must also use the same number of correct, appropriate release operations.
Therefore, creating or copying such matrices is more profitable, in contrast to a single distribution scheme.
It gets worse with increasing number of rows.
Memory overhead
I will write the size 32 bits for int and 32 bits for pointers. (Note: systemic dependency.)
Remember: we want to save a 4 ร 4 matrix, which means 64 bytes.
For the NxM matrix stored with the presented pointer-to-pointer scheme, we consume
N*M*sizeof(int) [actual blue data] +N*sizeof(int*) [green pointers] +sizeof(int**) [purple variable p] bytes.
This makes 4*4*4 + 4*4 + 4 = 84 bytes in the case of this example, and when using std::vector<std::vector<int>> it gets even worse. This will require N * M * sizeof(int) + N * sizeof(vector<int>) + sizeof(vector<vector<int>>) bytes, i.e. 4*4*4 + 4*16 + 16 = 144 bytes, a total of 64 bytes for 4 x 4 int.
In addition, depending on the allocator used, each individual distribution may well (and most likely will) have another 16 bytes of memory overhead. (Some "Infobytes" that store the number of bytes allocated for proper release.)
This is the worst case:
N*(16+M*sizeof(int)) + 16+N*sizeof(int*) + sizeof(int**)
= 4*(16+4*4) + 16+4*4 + 4 = 164 bytes ! _Overhead: 156%_
The share of service data will decrease as the size of the matrix grows, but it will still be present.
Memory leak hazard
A bunch of distributions requires appropriate exception handling to avoid memory leaks if one of the distributions fails! You need to keep track of allocated memory blocks, and you should not forget them when freeing memory.
If new runs through memory and the next line cannot be allocated (especially likely when the matrix is โโvery large), std::bad_alloc is called by new .
Example:
In the new / delete example above, we will look at another code if we want to avoid leaks in case of bad_alloc exceptions.
// allocate memory for 4x4 integers; quick & dirty size_t const N = 4; // we don't need try for this allocation // if it fails there is no leak int ** p = new int*[N]; size_t allocs(0U); try { // try block doing further allocations for (size_t i=0; i<N; ++i) { p[i] = new int[4]; // allocate ++allocs; // advance counter if no exception occured } } catch (std::bad_alloc & be) { // if an exception occurs we need to free out memory for (size_t i=0; i<allocs; ++i) delete[] p[i]; // free all alloced p[i]s delete[] p; // free p throw; // rethrow bad_alloc } /* do some stuff here, using p[x][y] */ // deallocate memory accoding to the number of allocations for (size_t i=0; i<allocs; ++i) delete[] p[i]; delete[] p;
Summary
There are cases when โreal 2dโ memory layouts are appropriate and make sense (that is, if the number of columns per row is not constant), but in the simplest and most common cases of storing 2D data, they simply inflate the complexity of your code and reduce performance and efficiency the memory of your program.
Alternative
You must use a contiguous block of memory and match your lines on that block.
The โC ++ pathโ for this is probably to write a class that manages your memory when considering important things like
Example
To imagine what such a class looks like, here is a simple example with some basic functions:
- 2d-size-build
- 2d variable
operator(size_t, size_t) to access the main item with two linesat(size_t, size_t) to check access to the main element from the 2nd row- Fulfills the conceptual requirements for the container
Source:
#include <vector> #include <algorithm> #include <iterator> #include <utility> namespace matrices { template<class T> class simple { public: // misc types using data_type = std::vector<T>; using value_type = typename std::vector<T>::value_type; using size_type = typename std::vector<T>::size_type; // ref using reference = typename std::vector<T>::reference; using const_reference = typename std::vector<T>::const_reference; // iter using iterator = typename std::vector<T>::iterator; using const_iterator = typename std::vector<T>::const_iterator; // reverse iter using reverse_iterator = typename std::vector<T>::reverse_iterator; using const_reverse_iterator = typename std::vector<T>::const_reverse_iterator; // empty construction simple() = default; // default-insert rows*cols values simple(size_type rows, size_type cols) : m_rows(rows), m_cols(cols), m_data(rows*cols) {} // copy initialized matrix rows*cols simple(size_type rows, size_type cols, const_reference val) : m_rows(rows), m_cols(cols), m_data(rows*cols, val) {} // 1d-iterators iterator begin() { return m_data.begin(); } iterator end() { return m_data.end(); } const_iterator begin() const { return m_data.begin(); } const_iterator end() const { return m_data.end(); } const_iterator cbegin() const { return m_data.cbegin(); } const_iterator cend() const { return m_data.cend(); } reverse_iterator rbegin() { return m_data.rbegin(); } reverse_iterator rend() { return m_data.rend(); } const_reverse_iterator rbegin() const { return m_data.rbegin(); } const_reverse_iterator rend() const { return m_data.rend(); } const_reverse_iterator crbegin() const { return m_data.crbegin(); } const_reverse_iterator crend() const { return m_data.crend(); } // element access (row major indexation) reference operator() (size_type const row, size_type const column) { return m_data[m_cols*row + column]; } const_reference operator() (size_type const row, size_type const column) const { return m_data[m_cols*row + column]; } reference at() (size_type const row, size_type const column) { return m_data.at(m_cols*row + column); } const_reference at() (size_type const row, size_type const column) const { return m_data.at(m_cols*row + column); } // resizing void resize(size_type new_rows, size_type new_cols) { // new matrix new_rows times new_cols simple tmp(new_rows, new_cols); // select smaller row and col size auto mc = std::min(m_cols, new_cols); auto mr = std::min(m_rows, new_rows); for (size_type i(0U); i < mr; ++i) { // iterators to begin of rows auto row = begin() + i*m_cols; auto tmp_row = tmp.begin() + i*new_cols; // move mc elements to tmp std::move(row, row + mc, tmp_row); } // move assignment to this *this = std::move(tmp); } // size and capacity size_type size() const { return m_data.size(); } size_type max_size() const { return m_data.max_size(); } bool empty() const { return m_data.empty(); } // dimensionality size_type rows() const { return m_rows; } size_type cols() const { return m_cols; } // data swapping void swap(simple &rhs) { using std::swap; m_data.swap(rhs.m_data); swap(m_rows, rhs.m_rows); swap(m_cols, rhs.m_cols); } private: // content size_type m_rows{ 0u }; size_type m_cols{ 0u }; data_type m_data{}; }; template<class T> void swap(simple<T> & lhs, simple<T> & rhs) { lhs.swap(rhs); } template<class T> bool operator== (simple<T> const &a, simple<T> const &b) { if (a.rows() != b.rows() || a.cols() != b.cols()) { return false; } return std::equal(a.begin(), a.end(), b.begin(), b.end()); } template<class T> bool operator!= (simple<T> const &a, simple<T> const &b) { return !(a == b); } }
Pay attention to a few things here:
T must satisfy the requirements of the used member functions std::vectoroperator() does not perform any "range" checks- You do not need to manage data yourself
- No destructor, copy constructor, or assignment operator required
Therefore, you do not need to worry about proper memory handling for each application, but only once for the class you are writing.
Limitations
There are times when a dynamic โrealโ two-dimensional structure is favorable. This, for example, is the case if
- the matrix is โโvery large and sparse (if any of the rows does not even need to be selected, but can be processed using nullptr), or if
- rows do not have the same number of columns (i.e. if you do not have a matrix at all, except for another two-dimensional construction).