Copy the unmanaged byte vector System.IntPtr into the array string of a 2D array of two-dimensional devices

I use C # and CUDAfy.net (yes, this problem is easier in direct C with pointers, but I have reasons to use this approach, given the large system).

I have a video capture card that collects image data of size [1024 x 1024] at 30 FPS. Every 33.3 ms, it fills the slot in the circular buffer and returns System.IntPtrwhich points to this unmanaged 1D vector *byte; The loop buffer has 15 slots.

On a GPU device (Tesla K40), I want to have a global 2D array that is organized as a dense 2D array. That is, I want something like Circular Queue, but on a GPU organized as a dense 2D array.

byte[15, 1024*1024] rawdata; 
// if CUDAfy.NET supported jagged arrays I could use byte[15][1024*1024 but it does not

How can I fill in another line every 33 ms? Am I using something like:

gpu.CopyToDevice<byte>(inputPtr, 0, rawdata, offset, length) // length = 1024*1024
//offset is computed by  rowID*(1024*1024) where rowID wraps to 0 via modulo 15.
// inputPrt is the System.Inptr that points to the buffer in the circular queue (un-managed)?
// rawdata is a device buffer allocated gpu.Allocate<byte>(1024*1024);

And in my kernel header there is:

[Cudafy]
public static void filter(GThread thread, byte[,] rawdata, int frameSize, byte[] result)

I tried something in this direction. But in CudaFy there is no API template for:

GPGPU.CopyToDevice(T) Method (IntPtr, Int32, T[,], Int32, Int32, Int32)

So, I used the gpu.Cast function to change the array of 2D devices to 1D.

I tried the code below, but I get the CUDA.net exception: ErrorLaunchFailed

FYI: When I try to use the CUDA emulator, it aborts on CopyToDevice claiming that the Data is not distributed across the host

public static byte[] process(System.IntPtr data, int slot)
{
    Stopwatch watch = new Stopwatch();
    watch.Start();
    byte[] output = new byte[FrameSize];
    int offset = slot*FrameSize;
    gpu.Lock();
    byte[] rawdata = gpu.Cast<byte>(grawdata, FrameSize); // What is the size supposed to be? Documentation lacking
    gpu.CopyToDevice<byte>(data, 0, rawdata, offset, FrameSize * frameCount);
    byte[] goutput = gpu.Allocate<byte>(output);
    gpu.Launch(height, width).filter(rawdata, FrameSize, goutput);
    runTime = watch.Elapsed.ToString();
    gpu.CopyFromDevice(goutput, output);
    gpu.Free(goutput);
    gpu.Synchronize();
    gpu.Unlock();
    watch.Stop();
    totalRunTime = watch.Elapsed.ToString();
    return output;
}
+4
source share
3 answers

You should use the Async GPGPU functionality built into a really efficient way to move data from / to the host / device and use gpuKern.LaunchAsync(...)

http://www.codeproject.com/Articles/276993/Base-Encoding-on-a-GPU . CudafyExamples, PinnedAsyncIO.cs. , , .

CudaGPU.cs Cudafy.Host, , ( ):

public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, DevicePtrEx devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[, ,] devArray,
                                 int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[,] devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[] devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;
0

If I understand your question correctly, I think that you want to convert
byte*what you get from the circular buffer into a multidimensional array bytethat you need to send to
the graphics card API.

            int slots = 15;
            int rows = 1024;
            int columns = 1024;

//Try this
            for (int currentSlot = 0; currentSlot < slots; currentSlot++)
            {
                IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);
                // use Marshal.Copy ?  
                byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory); 

                int offset =0;
                for (int m = 0; m < rows; m++)
                    for (int n = 0; n < columns; n++)
                    {
                        //then send this to your GPU method
                        rawForGpu[m, n] = ReadByteValue(IntPtr: intPtrToUnManagedMemory, 
                                                        offset++);
                    }
            }

//or try this
            for (int currentSlot = 0; currentSlot < slots; currentSlot++)
            {
                IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);

                // use Marshal.Copy ?
                byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory); 

                byte[,] rawForGpu = ConvertTo2DArray(byteData, rows, columns);
            }
        }

        private static byte[,] ConvertTo2DArray(byte[] byteArr, int rows, int columns)
        {
            byte[,] data = new byte[rows, columns];
            int totalElements = rows * columns;
            //Convert 1D to 2D rows, colums
            return data;
        }

        private static IntPtr CopyContextFrom(int slotNumber)
        {
            //code that return byte* from circular buffer.
            return IntPtr.Zero;
        }
+1
source

Source: https://habr.com/ru/post/1568974/


All Articles