I am currently executing some image processing algorithms using OpenCL. Basically, my algorithm requires solving a linear system of equations for each pixel. Each system is independent of the others, so the transition to a parallel implementation is natural.
I looked at several BLAS packages, such as ViennaCL and AMD APPML , but it seems that they all have the same usage pattern (calling the BLAS host that must be executed on the CL device).
I need a BLAS library that can be called inside the OpenCL kernel so that I can solve several linear systems in parallel.
I found this similar question on AMD forums.
thanks
source share