Fermat
Sorting
The SortEnactor provides a convenient wrapper around the fastest CUDA sorting library available, allowing to perform both key-only and key-value pair sorting of arrays with the following data-types:
  • uint8
  • uint16
  • uint32
  • uint64
  • (uint8,uint32)
  • (uint16,uint32)
  • (uint32,uint32)
  • (uint64,uint32)
The way most parallel sorting algorithms work require having a set of ping-pong buffers that are exchanged at every pass through the data. In order to do this, and communicate where the sorted data lies after its work, SortEnactor employs an auxiliary class, SortBuffers. The following example shows their combined usage.
void sort_test(const uint32 n, uint32* h_keys, uint32* h_data)
{
// allocate twice as much storage as the input to accomodate for ping-pong buffers
// copy the test data to the host
thrust::copy( h_keys, h_keys + n, d_keys.begin() );
thrust::copy( h_data, h_data + n, d_data.begin() );
// prepare the sorting buffers
cuda::SortBuffers<uint32*,uint32*> sort_buffers;
sort_buffers.keys[0] = raw_pointer( d_keys );
sort_buffers.keys[1] = raw_pointer( d_keys ) + n;
sort_buffers.data[0] = raw_pointer( d_data );
sort_buffers.data[1] = raw_pointer( d_data ) + n;
// and sort the data
cuda::SortEnactor sort_enactor;
sort_enactor.sort( n, sort_buffers );
// the sorted device data is now in here:
uint32* d_sorted_keys = sort_buffers.current_keys();
uint32* d_sorted_data = sort_buffers.current_values();
...
}