- This module contains a series of classes that implement a variety of CUDA work-queues. A work-queue is an object which allows to execute (or consume) a stream of work items according to different schedulers. Notably, there's two distinguishing features:
- continuations: work items can produce continuations: breaking up long computations in shorter pieces and using smart schedulers might allow to get better utilization when the execution time of individual work items is highly non-uniform;
- capacity constraints: queues can be assigned a maximum capacity, which can be used to control the amount of resources consumed to execute the work items in parallel (e.g. if each work item needs 1MB of temporary storage, on a 4GB GPU one might only execute 4k items in parallel)
Work-Streams
- The user of these classes have to specify a WorkStream class responsible for feeding work to the queue in the shape of a subclass WorkStream::WorkUnit. The latter is responsible for specifying the data and execution of each unit. WorkStream has to implement the following interface:
- queue_slot specifies a (queue position, queue id) pair, assuming there can be one input and one output continuation queue which are swapped among iterations. Knowing the queue slot can be useful to bind external data to a work-unit.
- When the method WorkQueue::consume( stream ) is called, the queue will launch a kernel to consume all WorkUnit's in the stream. WorkUnit has to implement a single method:
bool WorkUnit::run(const WorkStream& context)
- which should run the associated work and indicate whether the unit has finished execution, or whether it has produced a continuation (stored in the WorkUnit itself), that has to be run further. The WorkQueue will automatically queue the continuation for later execution.
- Optionally, the class can also be passed a WorkMover which is responsible for moving external data attached to any WorkUnit when its continuation gets assigned a new execution slot. This must implement a method:
void move(
const uint2 src_slot, WorkUnit* src_unit,
const uint2 dst_slot, WorkUnit* dst_unit) const;
Example
struct MyWorkStream;
struct MyWorkUnit
{
__device__ MyWorkUnit(
const uint32 _i) : i(_i) {}
__device__ bool run(MyWorkStream&);
private:
}
struct MyWorkStream
{
MyWorkStream(const _size) : m_size(_size) {}
__host__ __device__
uint32 size()
const {
return m_size; }
__device__
void get(
const uint32 i, MyWorkUnit*
unit,
const uint2 execution_slot)
const;
private:
}
__device__
void MyWorkStream::get(
const uint32 i, MyWorkUnit* unit,
const uint2 execution_slot)
const { *unit = MyWorkUnit(i); }
__device__ bool MyWorkUnit::run(MyWorkStream&) { if (i&1) { i/=2; return true; return false; }
{
MyWorkStream work_stream( 1024*1024 );
cuda::WorkQueue<cuda::InplaceQueueTag,MyWorkUnit> work_queue;
work_queue.consume( work_stream );
}
Work-Queue Schedulers
The WorkQueue class is parameterized by a template tag parameter specifying the scheduler. The available schedulers are: