I will tell you what I think of it from a Windows perspective. I am very experienced in writing server applications for Windows.
Firstly, there is absolutely no problem for creating 20k semaphores for a single process. This is a fairly lightweight kernel object. Even the "interprocess" semaphores.
I see, however, another problem with your design. You should be aware that every operation you perform on a kernel object (for example, a semaphore / mutec) involves a heavy transaction in kernel mode (aka system call). Each such call can cost you about two thousand processor cycles, even if there are no conflicts at all.
So, you can find yourself a situation where most of the processor time is spent only on calling synchronization methods.
Otherwise, you can use blocked operations to synchronize threads. They cost a lot less (usually dozens of processor cycles).
There is also an object called a critical section. This is a kind of hybrid of a locked operand and a kernel object (which is used if there is an actual collision). You should check how long you usually block your items. If these are usually short locks - just use critical sections, forget about complex read and write locks.
If you are still dealing with long-term locks, and you need to read and write locks, and you see that you spend a lot of CPU in kernel-mode transactions, consider creating your own (or try to find an existing) hybrid implementation of such a lock.
source share