PyCUDA: requesting device status (in particular, memory)

PyCUDA documentation mentions The driver interface is passing by, but I think a little and cannot figure out how to get information like 'SHARED_SIZE_BYTES' from my code.

Can someone point me to any device request examples this way?

Is it possible / How to check the status of a device (for example, between malloc / memcpy and kernel launch) to implement some machine-dynamic operations? (I want to deal with devices that support multiple cores in a "friendly" way.

+6
source share
1 answer

Just for everyone who understands this, spending half an hour on the CUDA API in one hand, and the PyCUDA documentation in the other, does wonders. Its much simpler than my initial experiments.

Runtime Runtime Information

Incoming lazy lazy code

... kernel=mod.get_function("foo") meminfo(kernel) ... def meminfo(kernel): shared=kernel.shared_size_bytes regs=kernel.num_regs local=kernel.local_size_bytes const=kernel.const_size_bytes mbpt=kernel.max_threads_per_block print("""=MEM=\nLocal:%d,\nShared:%d,\nRegisters:%d,\nConst:%d,\nMax Threads/B:%d"""%(local,shared,regs,const,mbpt)) 

Output example

 =MEM= Local:24, Shared:64, Registers:18, Const:0, Max Threads/B:512 

Static Device Information

Incoming lazy lazy code

 import pycuda.autoinit import pycuda.driver as cuda (free,total)=cuda.mem_get_info() print("Global memory occupancy:%f%% free"%(free*100/total)) for devicenum in range(cuda.Device.count()): device=cuda.Device(devicenum) attrs=device.get_attributes() #Beyond this point is just pretty printing print("\n===Attributes for device %d"%devicenum) for (key,value) in attrs.iteritems(): print("%s:%s"%(str(key),str(value))) 

Output example

 Global memory occupancy:70.000000% free ===Attributes for device 0 MAX_THREADS_PER_BLOCK:512 MAX_BLOCK_DIM_X:512 MAX_BLOCK_DIM_Y:512 MAX_BLOCK_DIM_Z:64 MAX_GRID_DIM_X:65535 MAX_GRID_DIM_Y:65535 MAX_GRID_DIM_Z:1 MAX_SHARED_MEMORY_PER_BLOCK:16384 TOTAL_CONSTANT_MEMORY:65536 WARP_SIZE:32 MAX_PITCH:2147483647 MAX_REGISTERS_PER_BLOCK:8192 CLOCK_RATE:1500000 TEXTURE_ALIGNMENT:256 GPU_OVERLAP:1 MULTIPROCESSOR_COUNT:14 KERNEL_EXEC_TIMEOUT:1 INTEGRATED:0 CAN_MAP_HOST_MEMORY:1 COMPUTE_MODE:DEFAULT MAXIMUM_TEXTURE1D_WIDTH:8192 MAXIMUM_TEXTURE2D_WIDTH:65536 MAXIMUM_TEXTURE2D_HEIGHT:32768 MAXIMUM_TEXTURE3D_WIDTH:2048 MAXIMUM_TEXTURE3D_HEIGHT:2048 MAXIMUM_TEXTURE3D_DEPTH:2048 MAXIMUM_TEXTURE2D_ARRAY_WIDTH:8192 MAXIMUM_TEXTURE2D_ARRAY_HEIGHT:8192 MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES:512 SURFACE_ALIGNMENT:256 CONCURRENT_KERNELS:0 ECC_ENABLED:0 PCI_BUS_ID:1 PCI_DEVICE_ID:0 TCC_DRIVER:0 
+14
source

Source: https://habr.com/ru/post/886310/


All Articles