It is safe to read the STL container from multiple parallel threads. However, the performance is terrible. Why?
I create a small object that stores some data in a multiset. This makes the designers quite expensive (about 5 microseconds on my machine). I store hundreds of thousands of small objects in a large multiset. Processing these objects is an independent business, so I broke up the work between threads running on a multi-core machine. Each thread reads the objects it needs from a large multiset and processes them.
The problem is that reading from a large multiset does not occur in parallel. It appears that reading in one thread blocks reading in another.
Below is the simplest code I can do and still showing the problem. First, he creates a large multiplier containing 100,000 small objects, each of which contains its own empty multiset. Then it calls the multiline copy instance twice, then again again in parallel.
The profiling tool shows that sequential copy constructors take about 0.23 seconds, and parallel duplicates. Somehow parallel copies interfere with each other.
class cTest
{
multiset<int> mine;
int id;
public:
cTest( int i ) : id( i ) {}
bool operator<(const cTest& o) const { return id < o.id; }
};
void Populate( multiset<cTest>& m )
{
for( int k = 0; k < 100000; k++ )
{
m.insert(cTest(k));
}
}
void Copy( const multiset<cTest>& m )
{
cRavenProfile profile("copy_main");
multiset<cTest> copy( m );
}
void Copy2( const multiset<cTest>& m )
{
cRavenProfile profile("copy_thread");
multiset<cTest> copy( m );
}
int _tmain(int argc, _TCHAR* argv[])
{
cRavenProfile profile("test");
profile.Start();
multiset<cTest> master;
Populate( master );
Copy( master );
Copy( master );
boost::thread* pt1 = new boost::thread( boost::bind( Copy2, master ));
boost::thread* pt2 = new boost::thread( boost::bind( Copy2, master ));
pt1->join();
pt2->join();
cRavenProfile print_profile;
return 0;
}
Here is the conclusion
Scope Calls Mean (secs) Total
copy_thread 2 0.472498 0.944997
copy_main 2 0.233529 0.467058
source
share