How to reserve a kernel for one thread in windows?

I am working on a very time sensitive application that checks that the shared memory area takes action when it detects that a change has occurred. Change is rare, but I need to minimize the time from transition to action. Given the frequency of changes, I think the processor cache is getting cold. Is there a way to reserve a kernel for my polling thread so that it does not have to compete with other threads for the cache or processor?

+4
source share
5 answers

Only thread merging ( SetThreadAffinityMask ) will not be enough. It does not reserve the processor core, but it does the opposite, it only associates the thread with the kernels you specify (this is not the same thing!).

By keeping the processor close, you reduce the likelihood that your thread will be running. If another thread with a higher priority runs on the same kernel, your thread will not be scheduled until this other thread is executed (this is how Windows schedules threads).
Without limiting affinity, your thread can be transferred to another kernel (the last time it was launched as an indicator for this solution). Migration of threads is undesirable if this happens often and soon after the thread starts (or during its launch), but it is a safe, useful thing if several tens of milliseconds have passed since the last scheduled one (caches will be overwritten anyway).

You can โ€œpartiallyโ€ guarantee that your thread will work by providing it with a class with a higher priority (no guarantee, but with a high probability). If you then use SetThreadAffinityMask , you have a reasonable chance that the cache will always be warm on most regular desktop processors (fortunately, it is usually VIPT and PIPT). For TLBs, you will probably be less fortunate, but there is nothing you can do about it.

The problem with a thread with a high priority is that it will starve with other threads, because scheduling is done, so it first serves classes with a higher priority, and until they are executed, the lower classes get zero. Thus, the solution in this case should be blocked. Otherwise, you may discourage the system in an unfavorable way.

Try the following:

  • create a semaphore and share it with another process.
  • set priority to THREAD_PRIORITY_TIME_CRITICAL
  • semaphore block
  • in another process, after writing data, call SignalObjectAndWait in a semaphore with a timeout of 1 (or even with a zero timeout)
  • if you want, you can experiment with binding them to the same core

This will create a thread that will be the first (or first) to get CPU time, but it does not work. When a writer thread calls SignalObjectAndWait , it atomically signals and blocks (even if it waits for "zero time", which is enough to reschedule). Another thread will wake up from Semaphore and do its job. Due to its high priority, it will not be interrupted by other "normal" (that is, unrealistic) threads. It will continue to call processor time until completion, and then block the semaphore again. At this point, SignalObjectAndWait returned.

+6
source

Using something like Sysinternals Process Explorer, you can establish the "proximity" of processes.

You will need to establish the proximity of your time-critical application to kernel 4 and the affinity of all other processes to kernels 1, 2, and 3. Suppose the four kernels are finite.

0
source

You can call SetProcessAffinityMask for each process, but yours with a mask that excludes only the kernel that will โ€œbelongโ€ to your process, and use it in your process so that it runs only on this core (or even better, SetThreadAffinityMask only on a thread, which does a critical task).

0
source

Given the frequency of changes, I think the processor cache is getting cold.

It sounds very strange.

Suppose your poll stream and message stream are on different cores.

The poll memory will read the address of the shared memory, and it will cache the data. This cache line is probably marked as exclusive. Then, finally, the record thread is written; first, it reads a line in the memory cache (so the line is now marked as split on both cores), and then it writes. The entry causes the polling stream processor cache line to be marked as invalid. Then the poll stream is read again; if it reads while data is still stored in the write stream, it will read the cache from the second core, invalid by its cache and ownership line for itself. To do this, you need a lot of traffic on the highway.

Another problem is that the message flow, if it does not write often, will almost certainly lose the TLB record for the page with the shared memory address. Recalculating a physical address is a long, slow process. Because polls are often conducted, it is possible that this page is always in these TLB cores; and in this sense, it may be better for you, in latent conditions, to have both threads in one core. (Although, if both of them are intensively calculated, they can interfere destructively and that the cost can be much higher - I cannot know, because I do not know what flows do).

One thing you could do is use a hyperthread on the core of the recording threads; if you know at an early stage what you are going to write, get hypertext to read the shared memory address. This will load the TLB and cache while the message flow is still busy computing, giving you parallelism.

0
source

The Win32 function SetThreadAffinityMask () is what you are looking for.

-1
source

Source: https://habr.com/ru/post/1344212/


All Articles