If the rest of your pipeline is in Python and you are already using Numpy to speed things up, pyCUDA is a good addition to speed up expensive operations. However, depending on the size of your images and the flow of your program, you may not get too much speed using pyCUDA. There is a delay in transferring data back and forth on the PCI bus, which is intended only for large data sizes.
In your case (addition and subtraction), pyCUDA has built-in operations that you can take advantage of. However, in my experience, using pyCUDA for something non-trivial requires a meaningful study of how CUDA works in the first place. For someone, starting with a lack of CUDA knowledge, pyCUDA can be a steep learning curve.
source share