ReadProcessMemory is faster than memcpy on SharedMemory

I am trying to improve my multiprocessor application using shared memory for communication. I did some profiling with simple tests, and something strange came out. When I try to copy data stored in SharedMemory, it is faster with ReadProcessMemory than with Memcopy.

I know that I should not use SharedMemory in this way (it is better to read right inside the shared memory), but I'm still wondering why this is happening. Following further research, one more thing appeared: if I make 2 consecutive memcpy in the same area of ​​shared memory (in fact, in the same area), the second copy is twice as fast as the first.

Here is a sample code showing the problem. In this example, there is only one process, but the problem is here. Running memcpy from a shared memory area is slower than running ReadProcessMemory in the same area in my own process!

#include <tchar.h> #include <basetsd.h> #include <iostream> #include <boost/interprocess/mapped_region.hpp> #include <boost/interprocess/windows_shared_memory.hpp> #include <time.h> namespace bip = boost::interprocess; #include <boost/asio.hpp> bip::windows_shared_memory* AllocateSharedMemory(UINT32 a_UI32_Size) { bip::windows_shared_memory* l_pShm = new bip::windows_shared_memory (bip::create_only, "Global\\testSharedMemory", bip::read_write, a_UI32_Size); bip::mapped_region l_region(*l_pShm, bip::read_write); std::memset(l_region.get_address(), 1, l_region.get_size()); return l_pShm; } //Copy the shared memory with memcpy void CopySharedMemory(UINT32 a_UI32_Size) { bip::windows_shared_memory m_shm(bip::open_only, "Global\\testSharedMemory", bip::read_only); bip::mapped_region l_region(m_shm, bip::read_only); void* l_pData = malloc(a_UI32_Size); memcpy(l_pData, l_region.get_address(), a_UI32_Size); free(l_pData); } //Copy the shared memory with ReadProcessMemory void ProcessCopySharedMemory(UINT32 a_UI32_Size) { bip::windows_shared_memory m_shm(bip::open_only, "Global\\testSharedMemory", bip::read_only); bip::mapped_region l_region(m_shm, bip::read_only); void* l_pData = malloc(a_UI32_Size); HANDLE hProcess = OpenProcess( PROCESS_ALL_ACCESS, FALSE,(DWORD) GetCurrentProcessId()); size_t l_szt_CurRemote_Readsize; ReadProcessMemory(hProcess, (LPCVOID)((void*)l_region.get_address()), l_pData, a_UI32_Size, (SIZE_T*)&l_szt_CurRemote_Readsize); free(l_pData); } // do 2 memcpy on the same shared memory void CopySharedMemory2(UINT32 a_UI32_Size) { bip::windows_shared_memory m_shm(bip::open_only, "Global\\testSharedMemory", bip::read_only); bip::mapped_region l_region(m_shm, bip::read_only); clock_t begin = clock(); void* l_pData = malloc(a_UI32_Size); memcpy(l_pData, l_region.get_address(), a_UI32_Size); clock_t end = clock(); std::cout << "FirstCopy: " << (end - begin) * 1000 / CLOCKS_PER_SEC << " ms" << std::endl; free(l_pData); begin = clock(); l_pData = malloc(a_UI32_Size); memcpy(l_pData, l_region.get_address(), a_UI32_Size); end = clock(); std::cout << "SecondCopy: " << (end - begin) * 1000 / CLOCKS_PER_SEC << " ms" << std::endl; free(l_pData); } int _tmain(int argc, _TCHAR* argv[]) { UINT32 l_UI32_Size = 1048576000; bip::windows_shared_memory* l_pShm = AllocateSharedMemory(l_UI32_Size); clock_t begin = clock(); for (int i=0; i<10 ; i++) CopySharedMemory(l_UI32_Size); clock_t end = clock(); std::cout << "MemCopy: " << (end - begin) * 1000 / CLOCKS_PER_SEC << " ms" << std::endl; begin = clock(); for (int i=0; i<10 ; i++) ProcessCopySharedMemory(l_UI32_Size); end = clock(); std::cout << "ReadProcessMemory: " << (end - begin) * 1000 / CLOCKS_PER_SEC << " ms" << std::endl; for (int i=0; i<10 ; i++) CopySharedMemory2(l_UI32_Size); delete l_pShm; return 0; } 

And here is the conclusion:

 MemCopy: 8891 ms ReadProcessMemory: 6068 ms FirstCopy: 796 ms SecondCopy: 327 ms FirstCopy: 795 ms SecondCopy: 328 ms FirstCopy: 780 ms SecondCopy: 344 ms FirstCopy: 780 ms SecondCopy: 343 ms FirstCopy: 780 ms SecondCopy: 327 ms FirstCopy: 795 ms SecondCopy: 343 ms FirstCopy: 780 ms SecondCopy: 344 ms FirstCopy: 796 ms SecondCopy: 343 ms FirstCopy: 796 ms SecondCopy: 327 ms FirstCopy: 780 ms SecondCopy: 328 ms 

If anyone has an idea why memcpy is so slow, and if there is a solution to this problem, I'm all ears.

Thanks.

+4
source share
1 answer

My comment as an answer for reference.

Using memcpy in most of the memory will require the OS to sift through its process / memory tables for each new page copied. Using "ReadProcessMemory", in turn, tells the OS directly on which pages, from which process another process should be copied.

This difference went away when you were comparing one page, confirming some of these.

I could guess that the reason memcpy is faster in a small script might be because ReadProcessMemory has an additional transition from user mode to kernel mode. Memcpy, on the other hand, drops the task into a basic memory management system that always runs in parallel with your process and is supported to some extent by hardware.

+2
source

Source: https://habr.com/ru/post/1447885/


All Articles