1D or 2D, which is faster?

I need to represent a 2D field (x, y axis), and I am facing a problem: should I use a 1D array or a 2D array?

I can imagine that recalculating indices for 1D arrays (y + x * n) can be slower than using a 2D array (x, y), but I could imagine that 1D could be in the CPU cache.

I did a few searches, but found only the pages regarding the static array (and stating that 1D and 2D are basically the same). But my arrays must change dynamically.

So what

  • faster,
  • less (RAM)

1D dynamic arrays or 2d dynamic arrays?

Thank:)

+58
c ++ c arrays
Jun 23 '13 at 10:36 on
source share
7 answers

tl; dr: You should probably use a one-dimensional approach.

Note. Itโ€™s impossible to figure out the details that affect performance when comparing dynamic 1d or dynamic 2d storage templates without filling out books, because code performance depends on a very large number of parameters. Profile, if possible.

1. What is faster?

For dense matrices, the 1D approach is likely to be faster because it offers better memory locality and lower distribution and free costs.

2. Which is less?

Dynamic-1D consumes less memory than the 2D approach. The latter also requires more appropriations.

Notes

I have outlined a rather long answer below for several reasons, but first I want to make some comments about your assumptions.

I can imagine that recalculating indices for 1D arrays (y + x * n) can be slower than using a 2D array (x, y)

Compare these two functions:

int get_2d (int **p, int r, int c) { return p[r][c]; } int get_1d (int *p, int r, int c) { return p[c + C*r]; } 

The assembly (not built-in) generated by Visual Studio 2015 RC for these functions (with optimizations enabled):

 ?get_1d@@YAHPAHII@Z PROC push ebp mov ebp, esp mov eax, DWORD PTR _c$[ebp] lea eax, DWORD PTR [eax+edx*4] mov eax, DWORD PTR [ecx+eax*4] pop ebp ret 0 ?get_2d@@YAHPAPAHII@Z PROC push ebp mov ebp, esp mov ecx, DWORD PTR [ecx+edx*4] mov eax, DWORD PTR _c$[ebp] mov eax, DWORD PTR [ecx+eax*4] pop ebp ret 0 

The difference is mov (2d) vs. lea (1d). The former has a delay of 3 cycles and a maximum throughput of 2 per cycle, while the latter has a latency of 2 cycles and a maximum throughput of 3 per cycle. (According to the Instruction Table - Agner Fog Since the differences are not significant, I think there should not be a big difference in performance related to recalculating the index. I expect that it is very unlikely that this difference in itself would be a bottleneck in any program.

This leads us to the following (and more interesting) point:

... but I could imagine that 1D could be in the CPU cache ...

True, but 2d may also be in the CPU cache. See The Downsides: Memory Location for an explanation of why 1d is still better.

A long answer or why a dynamic 2-dimensional data store (pointer to a pointer or a vector-vector) is "bad" for simple / small matrices.

Note. This applies to dynamic arrays / distribution schemes [malloc / new / vector, etc.]. A static 2d array is a contiguous block of memory and therefore not subject to the minuses that I am going to introduce here.

Problem

To understand why a dynamic array of dynamic arrays or a vector of vectors is most likely not a template for storing data of your choice, you need to understand the memory location of such structures.

Pointer Syntax Pointer Example

 int main (void) { // allocate memory for 4x4 integers; quick & dirty int ** p = new int*[4]; for (size_t i=0; i<4; ++i) p[i] = new int[4]; // do some stuff here, using p[x][y] // deallocate memory for (size_t i=0; i<4; ++i) delete[] p[i]; delete[] p; } 

disadvantages

Memory location

For this โ€œmatrixโ€ you select one block of four pointers and four blocks of four integers. All distributions are not connected with each other and therefore can lead to an arbitrary memory position.

The following image will give you an idea of โ€‹โ€‹what memory might look like.

For a real 2d case:

  • The purple square is the memory position occupied by p itself.
  • Green squares collect the region of memory p points to (4 x int* ).
  • 4 areas of 4 adjacent blue squares are those that each int* green area points to

For 2d plotted on the 1st case:

  • The green square is the only required pointer int *
  • Blue squares unite the memory area for all matrix elements (16 x int ).

Real 2d vs mapped 2d memory layout

This means that (when using the left layout) you are likely to notice worse performance than for a continuous storage template (as seen on the right) due to caching, for example.

Suppose a cache line is โ€œthe amount of data sent to the cache at a time,โ€ and imagine that a program accesses the entire matrix one element after another.

If you have a correctly aligned 4-fold 4-bit matrix of 32-bit values, a processor with a 64-byte cache line (typical value) is capable of "one-time" data (4 * 4 * 4 = 64 bytes). If you start processing, and the data is not yet in the cache, you will encounter a cache skip, and the data will be extracted from the main memory. This load can retrieve the entire matrix at once, because it fits into the cache line if and only if it is adjacent (and correctly aligned). There will probably be no more misses in processing this data.

In the case of a dynamic, "real two-dimensional" system with unrelated locations of each row / column, the processor must load each memory cell separately. Although only 64 bytes are required, loading 4 cache lines for 4 unlinked memory positions will in the worst case actually transfer 256 bytes and spend 75% of the bandwidth. If you process data using a 2d scheme, you again (if not cached) are faced with skipping the cache on the first element. But now only the first line / colum will be in the cache after the first load from the main memory, because all the other lines are located somewhere else in the memory, and not next to the first. As soon as you reach a new row / column, the cache will be skipped again and the next load from main memory will be executed.

In short: a 2d template has a higher chance of cache miss with the 1st scheme offering better performance potential due to data locality.

Private distribution / exemption

  • To create the desired NxM (4 ร— 4) matrix, integers are needed, such as N + 1 (4 + 1 = 5) distributions (using either new, malloc, allocator :: allocate, or any other).
  • You must also use the same number of correct, appropriate release operations.

Therefore, creating or copying such matrices is more profitable, in contrast to a single distribution scheme.

It gets worse with increasing number of rows.

Memory overhead

I will write the size 32 bits for int and 32 bits for pointers. (Note: systemic dependency.)

Remember: we want to save a 4 ร— 4 matrix, which means 64 bytes.

For the NxM matrix stored with the presented pointer-to-pointer scheme, we consume

  • N*M*sizeof(int) [actual blue data] +
  • N*sizeof(int*) [green pointers] +
  • sizeof(int**) [purple variable p] bytes.

This makes 4*4*4 + 4*4 + 4 = 84 bytes in the case of this example, and when using std::vector<std::vector<int>> it gets even worse. This will require N * M * sizeof(int) + N * sizeof(vector<int>) + sizeof(vector<vector<int>>) bytes, i.e. 4*4*4 + 4*16 + 16 = 144 bytes, a total of 64 bytes for 4 x 4 int.

In addition, depending on the allocator used, each individual distribution may well (and most likely will) have another 16 bytes of memory overhead. (Some "Infobytes" that store the number of bytes allocated for proper release.)

This is the worst case:

N*(16+M*sizeof(int)) + 16+N*sizeof(int*) + sizeof(int**)
= 4*(16+4*4) + 16+4*4 + 4 = 164 bytes ! _Overhead: 156%_

The share of service data will decrease as the size of the matrix grows, but it will still be present.

Memory leak hazard

A bunch of distributions requires appropriate exception handling to avoid memory leaks if one of the distributions fails! You need to keep track of allocated memory blocks, and you should not forget them when freeing memory.

If new runs through memory and the next line cannot be allocated (especially likely when the matrix is โ€‹โ€‹very large), std::bad_alloc is called by new .

Example:

In the new / delete example above, we will look at another code if we want to avoid leaks in case of bad_alloc exceptions.

  // allocate memory for 4x4 integers; quick & dirty size_t const N = 4; // we don't need try for this allocation // if it fails there is no leak int ** p = new int*[N]; size_t allocs(0U); try { // try block doing further allocations for (size_t i=0; i<N; ++i) { p[i] = new int[4]; // allocate ++allocs; // advance counter if no exception occured } } catch (std::bad_alloc & be) { // if an exception occurs we need to free out memory for (size_t i=0; i<allocs; ++i) delete[] p[i]; // free all alloced p[i]s delete[] p; // free p throw; // rethrow bad_alloc } /* do some stuff here, using p[x][y] */ // deallocate memory accoding to the number of allocations for (size_t i=0; i<allocs; ++i) delete[] p[i]; delete[] p; 

Summary

There are cases when โ€œreal 2dโ€ memory layouts are appropriate and make sense (that is, if the number of columns per row is not constant), but in the simplest and most common cases of storing 2D data, they simply inflate the complexity of your code and reduce performance and efficiency the memory of your program.

Alternative

You must use a contiguous block of memory and match your lines on that block.

The โ€œC ++ pathโ€ for this is probably to write a class that manages your memory when considering important things like

Example

To imagine what such a class looks like, here is a simple example with some basic functions:

  • 2d-size-build
  • 2d variable
  • operator(size_t, size_t) to access the main item with two lines
  • at(size_t, size_t) to check access to the main element from the 2nd row
  • Fulfills the conceptual requirements for the container

Source:

 #include <vector> #include <algorithm> #include <iterator> #include <utility> namespace matrices { template<class T> class simple { public: // misc types using data_type = std::vector<T>; using value_type = typename std::vector<T>::value_type; using size_type = typename std::vector<T>::size_type; // ref using reference = typename std::vector<T>::reference; using const_reference = typename std::vector<T>::const_reference; // iter using iterator = typename std::vector<T>::iterator; using const_iterator = typename std::vector<T>::const_iterator; // reverse iter using reverse_iterator = typename std::vector<T>::reverse_iterator; using const_reverse_iterator = typename std::vector<T>::const_reverse_iterator; // empty construction simple() = default; // default-insert rows*cols values simple(size_type rows, size_type cols) : m_rows(rows), m_cols(cols), m_data(rows*cols) {} // copy initialized matrix rows*cols simple(size_type rows, size_type cols, const_reference val) : m_rows(rows), m_cols(cols), m_data(rows*cols, val) {} // 1d-iterators iterator begin() { return m_data.begin(); } iterator end() { return m_data.end(); } const_iterator begin() const { return m_data.begin(); } const_iterator end() const { return m_data.end(); } const_iterator cbegin() const { return m_data.cbegin(); } const_iterator cend() const { return m_data.cend(); } reverse_iterator rbegin() { return m_data.rbegin(); } reverse_iterator rend() { return m_data.rend(); } const_reverse_iterator rbegin() const { return m_data.rbegin(); } const_reverse_iterator rend() const { return m_data.rend(); } const_reverse_iterator crbegin() const { return m_data.crbegin(); } const_reverse_iterator crend() const { return m_data.crend(); } // element access (row major indexation) reference operator() (size_type const row, size_type const column) { return m_data[m_cols*row + column]; } const_reference operator() (size_type const row, size_type const column) const { return m_data[m_cols*row + column]; } reference at() (size_type const row, size_type const column) { return m_data.at(m_cols*row + column); } const_reference at() (size_type const row, size_type const column) const { return m_data.at(m_cols*row + column); } // resizing void resize(size_type new_rows, size_type new_cols) { // new matrix new_rows times new_cols simple tmp(new_rows, new_cols); // select smaller row and col size auto mc = std::min(m_cols, new_cols); auto mr = std::min(m_rows, new_rows); for (size_type i(0U); i < mr; ++i) { // iterators to begin of rows auto row = begin() + i*m_cols; auto tmp_row = tmp.begin() + i*new_cols; // move mc elements to tmp std::move(row, row + mc, tmp_row); } // move assignment to this *this = std::move(tmp); } // size and capacity size_type size() const { return m_data.size(); } size_type max_size() const { return m_data.max_size(); } bool empty() const { return m_data.empty(); } // dimensionality size_type rows() const { return m_rows; } size_type cols() const { return m_cols; } // data swapping void swap(simple &rhs) { using std::swap; m_data.swap(rhs.m_data); swap(m_rows, rhs.m_rows); swap(m_cols, rhs.m_cols); } private: // content size_type m_rows{ 0u }; size_type m_cols{ 0u }; data_type m_data{}; }; template<class T> void swap(simple<T> & lhs, simple<T> & rhs) { lhs.swap(rhs); } template<class T> bool operator== (simple<T> const &a, simple<T> const &b) { if (a.rows() != b.rows() || a.cols() != b.cols()) { return false; } return std::equal(a.begin(), a.end(), b.begin(), b.end()); } template<class T> bool operator!= (simple<T> const &a, simple<T> const &b) { return !(a == b); } } 

Pay attention to a few things here:

  • T must satisfy the requirements of the used member functions std::vector
  • operator() does not perform any "range" checks
  • You do not need to manage data yourself
  • No destructor, copy constructor, or assignment operator required

Therefore, you do not need to worry about proper memory handling for each application, but only once for the class you are writing.

Limitations

There are times when a dynamic โ€œrealโ€ two-dimensional structure is favorable. This, for example, is the case if

  • the matrix is โ€‹โ€‹very large and sparse (if any of the rows does not even need to be selected, but can be processed using nullptr), or if
  • rows do not have the same number of columns (i.e. if you do not have a matrix at all, except for another two-dimensional construction).
+178
Jun 23 '13 at 11:59 on
source share

If not specified , you're talking about static arrays, 1D is faster .

Saves the memory location of a 1D array ( std::vector<T> ):

 +---+---+---+---+---+---+---+---+---+ | | | | | | | | | | +---+---+---+---+---+---+---+---+---+ 

And the same for a dynamic 2D array ( std::vector<std::vector<T>> ):

 +---+---+---+ | * | * | * | +-|-+-|-+-|-+ | | V | | +---+---+---+ | | | | | | | | +---+---+---+ | V | +---+---+---+ | | | | | | +---+---+---+ V +---+---+---+ | | | | +---+---+---+ 

Obviously, the two-dimensional case loses cache locality and uses more memory. It also introduces extra indirection (and therefore an extra pointer to follow), but the first array has the overhead of calculating the indices so that they are even larger or smaller.

+18
Jun 23 '13 at 10:54 on
source share

1D and 2D static arrays

  • Size: Both will need the same amount of memory.

  • Speed:. You can assume that there will be no difference in speed, because the memory for both of these arrays should be contiguous (The entire 2D array should be displayed as a single fragment in memory, and not a bunch of pieces are distributed from memory). (This may be a compiler dependent however.)

1D and 2D dynamic arrays

  • Size:. A 2D array will need a bit more memory than a 1D array due to the pointers needed in the 2D array to point to the set of allocated 1D arrays. (This small bit is only tiny when we talk about really large arrays. For small arrays, the tiny bit can be quite significant relative.)

  • Speed: A 1D array can be faster than a 2D array because the memory for the 2D array will not be contiguous, so problems with skipping the cache will become a problem.

    / li>



Use what works and seems to be the most logical, and if you encounter speed problems, then refactoring.

+9
Jun 23 '13 at 10:39 on
source share

Existing answers only compare 1-D arrays with pointer arrays.

In C (but not C ++) there is a third option; you can have an adjacent two-dimensional array dynamically allocated and having run-time dimensions:

 int (*p)[num_columns] = malloc(num_rows * sizeof *p); 

and it is accessed as p[row_index][col_index] .

I expect this to be very similar to working with an array with a 1-D array, but it gives you more convenient syntax for accessing cells.

In C ++, you can achieve something similar by specifying a class that supports a one-dimensional array inside, but can expose it through access syntax for a two-dimensional array using overloaded operators. Again, I would expect that a similar or identical performance would be equal to a 1-dimensional array.

+5
Jun 10 '15 at 0:58
source share

In memory allocation, another difference between 1D and 2D arrays appears. We cannot be sure that the members of the 2D array will be sequential.

+4
Jun 24 '13 at 2:26
source share

It really depends on how your 2D array is implemented.

consider the code below:

 int a[200], b[10][20], *c[10], *d[10]; for (ii = 0; ii < 10; ++ii) { c[ii] = &b[ii][0]; d[ii] = (int*) malloc(20 * sizeof(int)); // The cast for C++ only. } 

There are 3 implementations here: b, c and d

There will not be much difference in accessing b[x][y] or a[x*20 + y] , since one is you doing the calculations and the other is the compiler that does it for you. c[x][y] and d[x][y] slower because the machine must find the address pointed to by c[x] and then access the yth element from there. This is not one direct calculation. On some machines (for example, the AS400, which has 36-bit (non-bit) pointers), access to the pointer is extremely slow. It all depends on the architecture used. In architectures like x86, a and b have the same speed, c and d are slower than b.

+1
Jun 23 '13 at 11:11
source share

I like Pixelchemist's thorough answer. A simpler version of this solution may be as follows. First declare the dimensions:

 constexpr int M = 16; // rows constexpr int N = 16; // columns constexpr int P = 16; // planes 

Then create an alias and, get and set the methods:

 template<typename T> using Vector = std::vector<T>; template<typename T> inline T& set_elem(vector<T>& m_, size_t i_, size_t j_, size_t k_) { // check indexes here... return m_[i_*N*P + j_*P + k_]; } template<typename T> inline const T& get_elem(const vector<T>& m_, size_t i_, size_t j_, size_t k_) { // check indexes here... return m_[i_*N*P + j_*P + k_]; } 

Finally, a vector can be created and indexed as follows:

 Vector array3d(M*N*P, 0); // create 3-d array containing M*N*P zero ints set_elem(array3d, 0, 0, 1) = 5; // array3d[0][0][1] = 5 auto n = get_elem(array3d, 0, 0, 1); // n = 5 

Sizing a vector during initialization provides optimal performance . This solution is modified from this answer . Functions can be overloaded to support different sizes with a single vector. The disadvantage of this solution is that the parameters M, N, P are implicitly passed to the get and set functions. This can be solved by implementing the solution inside the class, as Pixelchemist did .

0
Feb 13 '18 at 23:59
source share



All Articles