Tree search operations are not so easy to implement CUDA. There are several articles such as
And one more rather simple implementation (not quite large-scale implementation in my opinion)
- "Acceleration of large graph algorithms on a GPU using CUDA" Pavan Harish and P. J. Narayanan
The difficulty arises from the fact that tree operations usually involve decision making and different branches are made in accordance with decisions. Therefore, massively parallelizing operations without overlapping and doing redundant operations is quite difficult.
There are several approaches that use the Stack and Queue implementations to move trees.
Here you can find a similar question: Error: BFS when synchronizing CUDA
phoad source share