Let's have a very simple C ++ class with only one data element:
class Container {
public:
std::vector<Element> elements;
Container(int elemCount);
};
Now create N threads that perform a very simple task:
- create a local container with a specific vector size
- swipe the vector and just increase each element of val
- Repeat step 2 10,000 times (to get the time in seconds instead of ms)
A complete list of codes can be found on Pastebin
According to CoreInfo, my processor (Intel Core i5 2400) has 4 cores and each of them has its own L1 / L2 caches:
Logical to Physical Processor Map:
*--- Physical Processor 0
-*-- Physical Processor 1
--*- Physical Processor 2
Logical Processor to Cache Map:
*--- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
*--- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
*--- Unified Cache 0, Level 2, 256 KB, Assoc 8, LineSize 64
-*-- Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
-*-- Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
-*-- Unified Cache 1, Level 2, 256 KB, Assoc 8, LineSize 64
--*- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
--*- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
--*- Unified Cache 2, Level 2, 256 KB, Assoc 8, LineSize 64
---* Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64
---* Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64
---* Unified Cache 3, Level 2, 256 KB, Assoc 8, LineSize 64
**** Unified Cache 4, Level 3, 6 MB, Assoc 12, LineSize 64
---* Physical Processor 3
For a vector size of up to 100,000 elements, timing is exactly as expected:
Elements count: 100.000
Threads: 1
loops: 10000 ms: 650
Threads: 4
loops: 2500 ms: 168
loops: 2500 ms: 169
loops: 2500 ms: 169
loops: 2500 ms: 171
However, for large vector sizes, the performance of several cores:
Elements count: 300.000
Threads: 1
loops: 10000 ms: 1968
Threads: 4
loops: 2500 ms: 3817
loops: 2500 ms: 3864
loops: 2500 ms: 3927
loops: 2500 ms: 4008
My questions:
- - , ? ? , , , L1/L2 ?
- ( ) ?
EDIT: , . :
@user2079303: memeber. SizeOf () = 8. . Pastebin.
@bku_drytt: (). , elemCount ( ).
@Jorge González Lorenzo: L3. , :
Elements count: 50.000
Threads: 1
loops: 50000 ms: 1615
Elements count: 200.000 (4 times bigger)
Threads: 1
loops: 50000 ms: 1615 (slightly more than 4 time bigger)
Elements count: 800.000 (even 4 times bigger)
Threads: 1
loops: 50000 ms: 42181 (MUCH more than 4 time bigger)