Over at TechEnablement, Rob Farber writes that the CUDA Thrust API now supports streams and concurrent kernels through the use of a new API called Bulk created by Jared Hoberock at NVIDIA.
The design of Bulk is intended to extend the parallel execution policies described in the evolving Technical Specification for Parallel Extensions for C++ N3960. Bulk leverages Hyper-Q and CUDA streams to run concurrent tasks on the GPU. It lets the programmer describe a parallel task (e.g. sort, for_each, reduction, etcetera) as a hierarchical grouping of execution agents. The big news is that concurrent kernel execution occurs with bulk without having to:
- Specify a launch configuration
- Decompose the problem into sub-tasks
- Marshal parameters
In this video from the GPU Technology Conference 2014, Jared Hoberock from Nvidia presents: Inside Thrust: Building Parallel Algorithms with Bulk.