How can I use 100% VRAM on a secondary GPU from a single process in Windows 10?

Question

How can I use 100% VRAM on a secondary GPU from a single process in Windows 10?

This is on a Windows 10 computer without a monitor connected to an Nvidia card. I have included output from nvida-smi showing> 5.04G.

Here is the tensorflow code that asks him to allocate a little more than I saw before: (I want it to be as close to the memory fragment = 1.0 as possible)

config = tf.ConfigProto() #config.gpu_options.allow_growth=True config.gpu_options.per_process_gpu_memory_fraction=0.84 config.log_device_placement=True sess = tf.Session(config=config)

Just before starting the above line in jupyter notepad, I ran nvida-smi:

  +-----------------------------------------------------------------------------+ | NVIDIA-SMI 376.51 Driver Version: 376.51 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... WDDM | 0000:01:00.0 Off | N/A | | 0% 27C P8 5W / 120W | 43MiB / 6144MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

The output from TF after a successful allocation of 5.01 GB shows "failed to allocate 5.04G (5411658752 bytes) from the device: CUDA_ERROR_OUT_OF_MEMORY" (you need to scroll to the right to see it below)

 2017-12-17 03:53:13.959871: IC:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845 pciBusID: 0000:01:00.0 totalMemory: 6.00GiB freeMemory: 5.01GiB 2017-12-17 03:53:13.960006: IC:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) 2017-12-17 03:53:13.961152: EC:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_driver.cc:936] failed to allocate 5.04G (5411658752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1 2017-12-17 03:53:14.151073: IC:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\direct_session.cc:299] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1

My best guess is that some kind of policy in the Nvidia user DLL level prevents the use of all memory (perhaps to connect a monitor?)

If this theory is correct, I look for any pen available to the user to turn it off on windows 10. If I am mistaken, any help to point in the right direction is appreciated.

Edit # 1:

I realized that I did not include this part of the study: The following code in tensorflow indicates that stream_exec "says" to TensorFlow that only 5.01 GB is free. This is the main reason for my current theory that some components of Nvidia interfere with distribution. (However, I might not understand that the component implements the stream_exec instance created.)

 auto stream_exec = executor.ValueOrDie(); int64 free_bytes; int64 total_bytes; if (!stream_exec->DeviceMemoryUsage(&free_bytes, &total_bytes)) { // Logs internally on failure. free_bytes = 0; total_bytes = 0; } const auto& description = stream_exec->GetDeviceDescription(); int cc_major; int cc_minor; if (!description.cuda_compute_capability(&cc_major, &cc_minor)) { // Logs internally on failure. cc_major = 0; cc_minor = 0; } LOG(INFO) << "Found device " << i << " with properties: " << "\nname: " << description.name() << " major: " << cc_major << " minor: " << cc_minor << " memoryClockRate(GHz): " << description.clock_rate_ghz() << "\npciBusID: " << description.pci_bus_id() << "\ntotalMemory: " << strings::HumanReadableNumBytes(total_bytes) << " freeMemory: " << strings::HumanReadableNumBytes(free_bytes); }

Edit # 2:

The stream below shows that Windows 10 prevents full use of VRAM everywhere through secondary graphics cards used for computing, capturing% of VRAM: https://social.technet.microsoft.com/Forums/windows/en-US/15b9654e-5da7 -45b7-93de-e8b63faef064 / windows-10-does-not-let-cuda-applications-to-use-all-vram-on-especially-secondary-graphics-cards? Forum = win10itprohardware

This thread seems implausible, because it means that all the windows of 10 boxes are inherently worse than Windows 7, for anything, where VRAM for calculating selected graphics cards can be a bottleneck.

Edit # 3:

Update the title to be clearer. Feedback indicates that it might be better, like a bug for Microsoft or Nvidia. I pursue other paths to understand this. However, I do not want to assume that this cannot be solved directly.
Further experiments show that the problem I am doing is regarding the case of a large allocation from one process. All VRAMs can be used when another process comes into play.

Change # 4

The error here is a denial of distribution, and according to NVIDIA-SMI above, I use 43MiB (maybe a system?), But not using an identifiable process. The type of failure that I see has a monolithic highlight. In a typical distribution model that requires continuous address space. Therefore, the question may be appropriate: what causes the use of 43MiB? Is it hosted in an address space such that a 5.01 GB allocation is the maximum available contiguous space available?

+5

windows-10 tensorflow cuda nvidia

Steve steiner Dec 17 '17 at 12:40

source share

2 answers

Steve steiner · Answer 1 · 2017-12-21T19:29:54+0000

I believe that for cards that support the TCC driver , this is a solvable problem. Unfortunately, my 1060 GTX doesn't seem to support this.

I need such a card for verification. If someone creates a solution that runs on the GTX 1060, I would definitely give out an award for someone who was able to demonstrate one process using 100% VRAM in Windows 10 using the TCC driver.

Nk · Answer 2 · 2017-12-26T09:37:50+0000

At the moment, this is clearly impossible, since Windows Display Driver Model 2.x has a certain limit, and no process can cancel it {Legally}.

Assuming you played with “Prefer Maximum Performance Tuning” so that you can push up to a maximum of 92% with a power source.

This will help you in detail if you like to know more about WDDM 2.x:

https://docs.microsoft.com/en-us/windows-hardware/drivers/display/what-s-new-for-windows-threshold-display-drivers--wddm-2-0-

How can I use 100% VRAM on a secondary GPU from a single process in Windows 10?

Edit # 1:

Edit # 2:

Edit # 3:

Change # 4

More articles: