I'm new to R. Suppose the memory layout is the same for the data frame and matrix.
In the next matrix
a = matrix (1: 10000000,1000000,10)
It has 1M rows and 10 columns. Is the memory for a row or for a column consistent physically? Or the first storage of physical memory [1,1], [2,1], [3,1], [1M, 1], [2,1] or [1,2], [1,2], .. [ 1.10], [2.1] ...?
Assume that the matrix with the 10M element is 100M in size and the L2 cache is 4M, then the L2 cache cannot store all of these 10M elements. If we process the data sequentially, we will have less L2 cache absence coefficient. In our case, we need to process line by line and read several columns at the same time, for example, columns A, B, C, and then create some results. If the memory layout first stores 10 items in the 1st row, then save 10 items in the 2nd row, then performance might be better.
If there is a way to control the layout of the memory?
source
share