Why does std :: condition_variable make planning unfair?

Question

Why does std :: condition_variable make planning unfair?

I am trying to create a simple pool object that would like to more or less distribute access to a set of shared resources for any threads that request it. In windows, I usually had an array of mutexes and ran WaitForMultipleObjects, with bWaitAll = FALSE (see Windows_pool_of_n_t below). But I hope that someday I can transfer this to other OSs, so I would like to adhere to the standard. A resource deck with a variable_condition of size ()! = 0 seemed an obvious solution (see Pool_of_n_t below).

But for reasons I don’t understand, this code serializes access to the stream. I do not expect strict justice, but this is perhaps the worst case - the thread in which it was blocked the last time always gets the lock the next time. This is not that std :: mutex does not correspond to a more or less fair Windows schedule, since using only a mutex without a condition variable works as expected, although for a pool of one, of course (see Pool_of_one_t below).

Can anyone explain this? Is there any way around this?

results:

C:\temp\stdpool>bin\stdpool.exe pool:pool_of_one_t thread 0:19826 ms thread 1:19846 ms thread 2:19866 ms thread 3:19886 ms thread 4:19906 ms thread 5:19926 ms thread 6:19946 ms thread 7:19965 ms thread 8:19985 ms thread 9:20004 ms pool:windows_pool_of_n_t(1) thread 0:19819 ms thread 1:19838 ms thread 2:19858 ms thread 3:19878 ms thread 4:19898 ms thread 5:19918 ms thread 6:19938 ms thread 7:19958 ms thread 8:19978 ms thread 9:19997 ms pool:pool_of_n_t(1) thread 9:3637 ms thread 0:4538 ms thread 6:7558 ms thread 4:9779 ms thread 8:9997 ms thread 2:13058 ms thread 1:13997 ms thread 3:17076 ms thread 5:17995 ms thread 7:19994 ms pool:windows_pool_of_n_t(2) thread 1:9919 ms thread 0:9919 ms thread 2:9939 ms thread 3:9939 ms thread 5:9958 ms thread 4:9959 ms thread 6:9978 ms thread 7:9978 ms thread 9:9997 ms thread 8:9997 ms pool:pool_of_n_t(2) thread 2:6019 ms thread 0:7882 ms thread 4:8102 ms thread 5:8182 ms thread 1:8382 ms thread 8:8742 ms thread 7:9162 ms thread 9:9641 ms thread 3:9802 ms thread 6:10201 ms pool:windows_pool_of_n_t(5) thread 4:3978 ms thread 3:3978 ms thread 2:3979 ms thread 0:3980 ms thread 1:3980 ms thread 9:3997 ms thread 7:3999 ms thread 6:3999 ms thread 5:4000 ms thread 8:4001 ms pool:pool_of_n_t(5) thread 2:3080 ms thread 0:3498 ms thread 8:3697 ms thread 3:3699 ms thread 6:3797 ms thread 7:3857 ms thread 1:3978 ms thread 4:4039 ms thread 9:4057 ms thread 5:4059 ms

code:

 #include <iostream> #include <deque> #include <vector> #include <mutex> #include <thread> #include <sstream> #include <chrono> #include <iomanip> #include <cassert> #include <condition_variable> #include <windows.h> using namespace std; class pool_t { public: virtual void check_in(int size) = 0; virtual int check_out() = 0; virtual string pool_name() = 0; }; class pool_of_one_t : public pool_t { mutex lock; public: virtual void check_in(int resource) { lock.unlock(); } virtual int check_out() { lock.lock(); return 0; } virtual string pool_name() { return "pool_of_one_t"; } }; class windows_pool_of_n_t : public pool_t { vector<HANDLE> resources; public: windows_pool_of_n_t(int size) { for (int i=0; i < size; ++i) resources.push_back(CreateMutex(NULL, FALSE, NULL)); } ~windows_pool_of_n_t() { for (auto resource : resources) CloseHandle(resource); } virtual void check_in(int resource) { ReleaseMutex(resources[resource]); } virtual int check_out() { DWORD result = WaitForMultipleObjects(resources.size(), resources.data(), FALSE, INFINITE); assert(result >= WAIT_OBJECT_0 && result < WAIT_OBJECT_0+resources.size()); return result - WAIT_OBJECT_0; } virtual string pool_name() { ostringstream name; name << "windows_pool_of_n_t(" << resources.size() << ")"; return name.str(); } }; class pool_of_n_t : public pool_t { deque<int> resources; mutex lock; condition_variable not_empty; public: pool_of_n_t(int size) { for (int i=0; i < size; ++i) check_in(i); } virtual void check_in(int resource) { unique_lock<mutex> resources_guard(lock); resources.push_back(resource); resources_guard.unlock(); not_empty.notify_one(); } virtual int check_out() { unique_lock<mutex> resources_guard(lock); not_empty.wait(resources_guard, [this](){return resources.size() > 0;}); auto resource = resources.front(); resources.pop_front(); bool notify_others = resources.size() > 0; resources_guard.unlock(); if (notify_others) not_empty.notify_one(); return resource; } virtual string pool_name() { ostringstream name; name << "pool_of_n_t(" << resources.size() << ")"; return name.str(); } }; void worker_thread(int id, pool_t& resource_pool) { auto start_time = chrono::system_clock::now(); for (int i=0; i < 100; ++i) { auto resource = resource_pool.check_out(); this_thread::sleep_for(chrono::milliseconds(20)); resource_pool.check_in(resource); this_thread::yield(); } static mutex cout_lock; { unique_lock<mutex> cout_guard(cout_lock); cout << "thread " << id << ":" << chrono::duration_cast<chrono::milliseconds>( chrono::system_clock::now() - start_time).count() << " ms" << endl; } } void test_it(pool_t& resource_pool) { cout << "pool:" << resource_pool.pool_name() << endl; vector<thread> threads; for (int i=0; i < 10; ++i) threads.push_back(thread(worker_thread, i, ref(resource_pool))); for (auto& thread : threads) thread.join(); } int main(int argc, char* argv[]) { test_it(pool_of_one_t()); test_it(windows_pool_of_n_t(1)); test_it(pool_of_n_t(1)); test_it(windows_pool_of_n_t(2)); test_it(pool_of_n_t(2)); test_it(windows_pool_of_n_t(5)); test_it(pool_of_n_t(5)); return 0; }

+6

c ++ multithreading c ++ 11

Mark wright Jun 15 '15 at 14:38

source share

2 answers

Sergei Kurenkov · Answer 1 · 2015-06-15T16:25:36+0000

I ran your pool:pool_of_n_t(2) test on Linux and saw a problem in

 this_thread::yield();

See the results on my comp for the test pool: pool_of_n_t (2):

1) this_thread :: yield ():

 $./a.out pool:pool_of_n_t(2) thread 0, run for:2053 ms thread 9, run for:3721 ms thread 5, run for:4830 ms thread 6, run for:6854 ms thread 3, run for:8229 ms thread 4, run for:8353 ms thread 7, run for:9441 ms thread 2, run for:9482 ms thread 1, run for:10127 ms thread 8, run for:10426 ms

They are like yours.

2) And the same test when I replace this_thread::yield() with pthread_yield() :

 $ ./a.out pool:pool_of_n_t(2) thread 0, run for:7922 ms thread 3, run for:8853 ms thread 4, run for:8854 ms thread 1, run for:9077 ms thread 5, run for:9364 ms thread 9, run for:9446 ms thread 7, run for:9594 ms thread 2, run for:9615 ms thread 8, run for:10170 ms thread 6, run for:10416 ms

This is much more true. You assume that this_thread :: yield () does give the CPU to another thread, but it does not.

This is the disas for this_thread :: yield for gcc 4.8:

 (gdb) disassembl this_thread::yield Dump of assembler code for function std::this_thread::yield(): 0x0000000000401fb2 <+0>: push %rbp 0x0000000000401fb3 <+1>: mov %rsp,%rbp 0x0000000000401fb6 <+4>: pop %rbp 0x0000000000401fb7 <+5>: retq End of assembler dump.

I do not see a rescheduling

And these are the disas for pthread_yield:

 (gdb) disassemble pthread_yield Dump of assembler code for function pthread_yield: 0x0000003149c084c0 <+0>: jmpq 0x3149c05448 < sched_yield@plt > End of assembler dump. (gdb) disassemble sched_yield Dump of assembler code for function sched_yield: 0x00000031498cf520 <+0>: mov $0x18,%eax 0x00000031498cf525 <+5>: syscall 0x00000031498cf527 <+7>: cmp $0xfffffffffffff001,%rax 0x00000031498cf52d <+13>: jae 0x31498cf530 <sched_yield+16> 0x00000031498cf52f <+15>: retq 0x00000031498cf530 <+16>: mov 0x2bea71(%rip),%rcx # 0x3149b8dfa8 0x00000031498cf537 <+23>: xor %edx,%edx 0x00000031498cf539 <+25>: sub %rax,%rdx 0x00000031498cf53c <+28>: mov %edx,%fs:(%rcx) 0x00000031498cf53f <+31>: or $0xffffffffffffffff,%rax 0x00000031498cf543 <+35>: jmp 0x31498cf52f <sched_yield+15> End of assembler dump.

Dale wilson · Answer 2 · 2015-06-15T16:56:59+0000

I do not think the condition variable is the culprit.

Both Linux's “Full Fair Queue” and the Windows thread manager assume that the ideal goal is to give each thread a full chunk of time (that is, be fair.) They carry this until they assume that if the thread goes out before After he spends his entire time period, he goes near the front of the line (this is a gross simplification), because it is a “fair” thing to do.

I find this very unsuccessful. If you have three threads, one of which can work, and the other two of them are blocked waiting for this, the Windows and Linux dispatchers will bounce back and forth between the blocked threads many times before giving the “correct” thread a chance.

Why does std :: condition_variable make planning unfair?

More articles: