Read in parallel from STL containers

It is safe to read the STL container from multiple parallel threads. However, the performance is terrible. Why?

I create a small object that stores some data in a multiset. This makes the designers quite expensive (about 5 microseconds on my machine). I store hundreds of thousands of small objects in a large multiset. Processing these objects is an independent business, so I broke up the work between threads running on a multi-core machine. Each thread reads the objects it needs from a large multiset and processes them.

The problem is that reading from a large multiset does not occur in parallel. It appears that reading in one thread blocks reading in another.

Below is the simplest code I can do and still showing the problem. First, he creates a large multiplier containing 100,000 small objects, each of which contains its own empty multiset. Then it calls the multiline copy instance twice, then again again in parallel.

The profiling tool shows that sequential copy constructors take about 0.23 seconds, and parallel duplicates. Somehow parallel copies interfere with each other.

// a trivial class with a significant ctor and ability to populate an associative container
class cTest
{
    multiset<int> mine;
    int id;
public:
    cTest( int i ) : id( i ) {}
    bool operator<(const cTest& o) const { return  id < o.id;  }
};
// add 100,000 objects to multiset
void Populate( multiset<cTest>& m )
{
    for( int k = 0; k < 100000; k++ )
    {
        m.insert(cTest(k));
    }
}
// copy construct multiset, called from mainline
void Copy( const multiset<cTest>& m )
{
    cRavenProfile profile("copy_main");
    multiset<cTest> copy( m );
}
// copy construct multiset, called from thread
void Copy2( const multiset<cTest>& m )
{
    cRavenProfile profile("copy_thread");
    multiset<cTest> copy( m );
}
int _tmain(int argc, _TCHAR* argv[])
{
    cRavenProfile profile("test");
    profile.Start();

    multiset<cTest> master;

    Populate( master );

    // two calls to copy ctor from mainline
    Copy( master );
    Copy( master );

    // call copy ctor in parrallel
    boost::thread* pt1 = new boost::thread( boost::bind( Copy2, master ));
    boost::thread* pt2 = new boost::thread( boost::bind( Copy2, master ));

    pt1->join();
    pt2->join();

    // display profiler results
    cRavenProfile print_profile;

    return 0;
}

Here is the conclusion

            Scope   Calls       Mean (secs)     Total
      copy_thread        2      0.472498        0.944997
        copy_main        2      0.233529        0.467058
+3
source share
5 answers

You mentioned copy constructors. I assume they also allocate memory from the heap?

.

, , . ( ), .

+10

, , .

, :

  • boost:: bind , . , . ( !) , :

    boost:: thread * pt1 = new boost:: thread (boost:: bind (Copy2, boost:: cref (master)));

  • Zan Lynx, , , , . ( , .)

# 1 , .

# 2, , STL . , .

. , , hoard (hoard.org). , .

  • .
  • malloc, , .

, boost:: pool boost:: threadspecificptr. IMHO, ++, , , .

+2

? , , , , . , , , . .

        step 0 1 2 3 4 5 6 7 8 9
threaded:    1,2,1,2,1,2,1,2,1,2
sequential:  1,1,1,1,1,2,2,2,2,2

0 8, 8; 2 1 9, - 8. 5 . , 16 10 .

, , , , . .

0

, , . , , - :
, , .

for(int loop=0;loop < 100;++loop)
{
   ts = timer();
   Copy( master );    
   Copy( master );
   te = timer();
   tt += te - ts;
}
tt /= 100;

.. .

0

, :

        step 0 1 2 3 4 5 6 7 8 9
core1:       1 1 1 1 1
core2:       2,2,2,2,2
sequential:  1,1,1,1,1,2,2,2,2,2

.

As an experiment, I am replacing a large multiset with an array of pointers to cTest. The code now has huge memory leaks, but never mind. Interestingly, relative performance is worse - running parallel copies in parallel slows them down by 4 times!

Scope   Calls       Mean (secs)     Total
    copy_array_thread        2      0.454432        0.908864
      copy_array_main        2      0.116905        0.233811
0
source

Source: https://habr.com/ru/post/1716935/


All Articles