Depth Search in CUDA / OpenCL

I'm halfway through the implementation of a parallel depth search algorithm in MPI, and I'm thinking of trying to do this in CUDA / OpenCL, just for fun / out of curiosity. The algorithm is simple, but not trivial. The single-core version in C is about 200 lines of code.

How many GPGPUs are suitable for this kind of problem?

+4
source share
1 answer

Tree search operations are not so easy to implement CUDA. There are several articles such as

And one more rather simple implementation (not quite large-scale implementation in my opinion)

  • "Acceleration of large graph algorithms on a GPU using CUDA" Pavan Harish and P. J. Narayanan

The difficulty arises from the fact that tree operations usually involve decision making and different branches are made in accordance with decisions. Therefore, massively parallelizing operations without overlapping and doing redundant operations is quite difficult.

There are several approaches that use the Stack and Queue implementations to move trees.

Here you can find a similar question: Error: BFS when synchronizing CUDA

+5
source

Source: https://habr.com/ru/post/1437150/


All Articles