How to use Media Foundation Source Reader hardware acceleration to decode video?

I am working on a h264 hardware accelerated decoder using the Media Foundation Source Reader, but have run into a problem. I followed this tutorial and supported the Windows SDK Media Foundation samples.


My application works fine when hardware acceleration is disabled, but it does not provide the required performance. When I turn on acceleration by passing the IMFDXGIDeviceManager to IMFAttributes used to create the reader, things get more complicated.

If I create ID3D11Device using the D3D_DRIVER_TYPE_NULL driver, the application works fine and the frames are processed faster than in program mode, but judging by the use of the processor and the GPU, it still does most of the processing on the processor.

On the other hand, when I create the ID3D11Device using the D3D_DRIVER_TYPE_HARDWARE driver and run the application, one of these four things can happen.

  • I get an unpredictable number of frames (usually 1-3) before the IMFMediaBuffer::Lock function returns 0x887a0005, which is described as "an instance of the GPU device was suspended. Use GetDeviceRemovedReason to determine the appropriate action" When I call ID3D11Device::GetDeviceRemovedReason , I I get 0x887a0020, which is described as "The driver encountered a problem and was placed in the state of the removed device", which is not as useful as I would like.

  • The application crashes in an external dll when IMFMediaBuffer::Lock called. It seems that the dll depends on the GPU used. For the integrated Intel GPU, this is igd10iumd32.dll, and for the Nvidia mobile GPU, it is mfplat.dll. The message about this particular failure is as follows: "An exception was thrown at 0x53C6DB8C (mfplat.dll) in the file decoder_ tester.exe: 0xC0000005: read access violation location 0x00000024". The addresses are different between performances, and sometimes they include reading, sometimes writing.

  • The graphics driver stops responding, the system freezes for a short time, and then the application crashes, as at point 2, or ends, as in paragraph 1.

  • The application works great and processes all frames using hardware acceleration.

In most cases, it is 1 or 2, rarely 3 or 4.


Here is what CPU / GPU usage is when processing without throttling in different modes on my machine (Intel Core i5-6500 with HD Graphics 530, Windows 10 Pro).

  • NULL - CPU: ~ 90%, GPU: ~ 15%
  • EQUIPMENT - CPU: ~ 15%, GPU: ~ 60%
  • SOFTWARE - CPU: ~ 40%, GPU: ~ 7%

I tested the application on three machines. All of them have integrated Intel GPUs (HD 4400, HD 4600, HD 530). One of them also had a switchable NVIDIA GPU (GF 840M). It is equally similar to all of them, the only difference is that it crashes in another dll when using the Nvidia GPU.


I have no previous experience with COM or DirectX, but all this is inconsistent and unpredictable, so for me it looks like a memory corruption. However, I do not know where I am making a mistake. Could you help me find what I am doing wrong?

Below is a minimal code example that I could find. I am using Visual Studio Professional 2015 to compile it as a C ++ project. I prepared definitions for enabling hardware acceleration and choosing a hardware driver. Comment on them to change the behavior. In addition, the code expects this video file , which will be present in the project directory.

 #include <iostream> #include <string> #include <atlbase.h> #include <d3d11.h> #include <mfapi.h> #include <mfidl.h> #include <mfreadwrite.h> #include <windows.h> #pragma comment(lib, "d3d11.lib") #pragma comment(lib, "mf.lib") #pragma comment(lib, "mfplat.lib") #pragma comment(lib, "mfreadwrite.lib") #pragma comment(lib, "mfuuid.lib") #define ENABLE_HW_ACCELERATION #define ENABLE_HW_DRIVER void handle_result(HRESULT hr) { if (SUCCEEDED(hr)) return; WCHAR message[512]; FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, nullptr, hr, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), message, ARRAYSIZE(message), nullptr); printf("%ls", message); abort(); } int main(int argc, char** argv) { handle_result(CoInitializeEx(nullptr, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE)); handle_result(MFStartup(MF_VERSION)); { CComPtr<IMFAttributes> attributes; handle_result(MFCreateAttributes(&attributes, 3)); #if defined(ENABLE_HW_ACCELERATION) CComPtr<ID3D11Device> device; D3D_FEATURE_LEVEL levels[] = { D3D_FEATURE_LEVEL_11_1, D3D_FEATURE_LEVEL_11_0 }; #if defined(ENABLE_HW_DRIVER) handle_result(D3D11CreateDevice(nullptr, D3D_DRIVER_TYPE_HARDWARE, nullptr, D3D11_CREATE_DEVICE_SINGLETHREADED | D3D11_CREATE_DEVICE_VIDEO_SUPPORT, levels, ARRAYSIZE(levels), D3D11_SDK_VERSION, &device, nullptr, nullptr)); #else handle_result(D3D11CreateDevice(nullptr, D3D_DRIVER_TYPE_NULL, nullptr, D3D11_CREATE_DEVICE_SINGLETHREADED, levels, ARRAYSIZE(levels), D3D11_SDK_VERSION, &device, nullptr, nullptr)); #endif UINT token; CComPtr<IMFDXGIDeviceManager> manager; handle_result(MFCreateDXGIDeviceManager(&token, &manager)); handle_result(manager->ResetDevice(device, token)); handle_result(attributes->SetUnknown(MF_SOURCE_READER_D3D_MANAGER, manager)); handle_result(attributes->SetUINT32(MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS, TRUE)); handle_result(attributes->SetUINT32(MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING, TRUE)); #else handle_result(attributes->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, TRUE)); #endif CComPtr<IMFSourceReader> reader; handle_result(MFCreateSourceReaderFromURL(L"Rogue One - A Star Wars Story - Trailer.mp4", attributes, &reader)); CComPtr<IMFMediaType> output_type; handle_result(MFCreateMediaType(&output_type)); handle_result(output_type->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video)); handle_result(output_type->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32)); handle_result(reader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, nullptr, output_type)); unsigned int frame_count{}; std::cout << "Started processing frames" << std::endl; while (true) { CComPtr<IMFSample> sample; DWORD flags; handle_result(reader->ReadSample(MF_SOURCE_READER_FIRST_VIDEO_STREAM, 0, nullptr, &flags, nullptr, &sample)); if (flags & MF_SOURCE_READERF_ENDOFSTREAM || sample == nullptr) break; std::cout << "Frame " << frame_count++ << std::endl; CComPtr<IMFMediaBuffer> buffer; BYTE* data; handle_result(sample->ConvertToContiguousBuffer(&buffer)); handle_result(buffer->Lock(&data, nullptr, nullptr)); // Use the frame here. buffer->Unlock(); } std::cout << "Finished processing frames" << std::endl; } MFShutdown(); CoUninitialize(); return 0; } 
+5
source share
2 answers

Your code is correct, conceptually, with a single note - and it is not entirely obvious - that the Media Foundation decoder is multi-threaded. You feed him a single-threaded version of the Direct3D device. You must work on this, or get what you are currently getting: access to violations and freezes, that is, undefined behavior.

  // NOTE: No single threading handle_result(D3D11CreateDevice(nullptr, D3D_DRIVER_TYPE_HARDWARE, nullptr, (0 * D3D11_CREATE_DEVICE_SINGLETHREADED) | D3D11_CREATE_DEVICE_VIDEO_SUPPORT, levels, ARRAYSIZE(levels), D3D11_SDK_VERSION, &device, nullptr, nullptr)); // NOTE: Getting ready for multi-threaded operation const CComQIPtr<ID3D10Multithread> pMultithread = device; pMultithread->SetMultithreadProtected(TRUE); 

Also note that this simple code example has a performance bottleneck around the lines you added to get an adjacent buffer. Apparently, this is your step to access the data ... however, the design behavior is that the decoded data is already in the video memory, and transferring to system memory is an expensive operation. That is, you have added a major performance hit to the loop. You will be interested in validating the data in this way, and when it comes to benchmarking performance, you should rather comment on this.

+2
source

The output types of the H264 video decoder can be found here: https://msdn.microsoft.com/en-us/library/windows/desktop/dd797815(v=vs.85).aspx . RGB32 is not one of them. In this case, your application uses the MFT video processor to convert from any of MFVideoFormat_I420, MFVideoFormat_IYUV, MFVideoFormat_NV12, MFVideoFormat_YUY2, MFVideoFormat_YV12 to RGB32. I believe this is an MFT video processor that acts weirdly and makes your program behave badly. Therefore, by setting NV12 as the output subtype for the decoder, you will get rid of the MFT video processor, and the following lines of code are also useless:

 handle_result(attributes->SetUINT32(MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING, TRUE)); 

and

 handle_result(attributes->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, TRUE)); 

In addition, as you noticed, NV12 is the only format that works correctly. I think the reason is that this is the only one used in accelerated scripts by the D3D and DXGI device manager.

+1
source

Source: https://habr.com/ru/post/1260652/


All Articles