Image processing using CUDA, python (pycuda) or C ++ implementation?

I am working on image processing using CUDA. A project is simply adding or subtracting an image.

May I ask your professional opinion which is best and what would be the advantages and disadvantages of the two?

I appreciate all opinions and / or suggestions, as this project is very important for me.

+4
source share
4 answers

General answer: It does not matter. Use the language that suits you best.

Keep in mind, however, that pycuda is just a shell of the CUDA C interface, so it may not always be relevant, and it also adds another potential source of errors, ...

Python is great for rapid prototyping, so I will personally go to Python. You can always switch to C ++ later if you need to.

+6
source

If the rest of your pipeline is in Python and you are already using Numpy to speed things up, pyCUDA is a good addition to speed up expensive operations. However, depending on the size of your images and the flow of your program, you may not get too much speed using pyCUDA. There is a delay in transferring data back and forth on the PCI bus, which is intended only for large data sizes.

In your case (addition and subtraction), pyCUDA has built-in operations that you can take advantage of. However, in my experience, using pyCUDA for something non-trivial requires a meaningful study of how CUDA works in the first place. For someone, starting with a lack of CUDA knowledge, pyCUDA can be a steep learning curve.

+3
source

Take a look at openCV , it contains many image processing functions and all the helpers for loading / saving / displaying images and managing cameras.

Now it also supports CUDA, some image processing functions have been reimplemented in CUDA, and this gives you a good basis for independent work.

+2
source

Alex answer is right. The amount of time spent wrapping is minimal. Please note that PyCUDA has some nice metaprogramming constructs to create kernels that can be useful.

If all you do is add or subtract image elements, you probably shouldn't use CUDA for this at all. The time taken to transfer back and forth on the PCI-E bus will overshadow the amount of savings you get from parallelism.

At any time when you are dealing with CUDA, it is useful to think about the CGMA ratio (calculation with the global memory access ratio). Your addition / subtraction is just 1 floating point operation for two memory accesses (1 read and 1 write). It ends up being very disgusting in terms of CUDA.

0
source

Source: https://habr.com/ru/post/1339375/


All Articles