Only thread merging ( SetThreadAffinityMask ) will not be enough. It does not reserve the processor core, but it does the opposite, it only associates the thread with the kernels you specify (this is not the same thing!).
By keeping the processor close, you reduce the likelihood that your thread will be running. If another thread with a higher priority runs on the same kernel, your thread will not be scheduled until this other thread is executed (this is how Windows schedules threads).
Without limiting affinity, your thread can be transferred to another kernel (the last time it was launched as an indicator for this solution). Migration of threads is undesirable if this happens often and soon after the thread starts (or during its launch), but it is a safe, useful thing if several tens of milliseconds have passed since the last scheduled one (caches will be overwritten anyway).
You can โpartiallyโ guarantee that your thread will work by providing it with a class with a higher priority (no guarantee, but with a high probability). If you then use SetThreadAffinityMask , you have a reasonable chance that the cache will always be warm on most regular desktop processors (fortunately, it is usually VIPT and PIPT). For TLBs, you will probably be less fortunate, but there is nothing you can do about it.
The problem with a thread with a high priority is that it will starve with other threads, because scheduling is done, so it first serves classes with a higher priority, and until they are executed, the lower classes get zero. Thus, the solution in this case should be blocked. Otherwise, you may discourage the system in an unfavorable way.
Try the following:
- create a semaphore and share it with another process.
- set priority to THREAD_PRIORITY_TIME_CRITICAL
- semaphore block
- in another process, after writing data, call SignalObjectAndWait in a semaphore with a timeout of 1 (or even with a zero timeout)
- if you want, you can experiment with binding them to the same core
This will create a thread that will be the first (or first) to get CPU time, but it does not work. When a writer thread calls SignalObjectAndWait , it atomically signals and blocks (even if it waits for "zero time", which is enough to reschedule). Another thread will wake up from Semaphore and do its job. Due to its high priority, it will not be interrupted by other "normal" (that is, unrealistic) threads. It will continue to call processor time until completion, and then block the semaphore again. At this point, SignalObjectAndWait returned.
Damon source share