CUB  
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Static Public Methods | List of all members
cub::DeviceMergeSort Struct Reference

Detailed description

DeviceMergeSort provides device-wide, parallel operations for computing a merge sort across a sequence of data items residing within device-accessible memory.

Overview
  • DeviceMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types (as long as a value of these types is a model of [LessThan Comparable]) and comparison functors, but is slower than DeviceRadixSort when sorting arithmetic types into ascending/descending order.
  • Another difference from RadixSort is the fact that DeviceMergeSort can handle arbitrary random-access iterators, as shown below.
A Simple Example
The code snippet below illustrates a thrust reverse iterator usage.
#include <cub/cub.cuh> // or equivalently <cub/device/device_merge_sort.cuh>
struct CustomLess
{
template <typename DataType>
__device__ bool operator()(const DataType &lhs, const DataType &rhs)
{
return lhs < rhs;
}
};
// Declare, allocate, and initialize device-accessible pointers
// for sorting data
thrust::device_vector<KeyType> d_keys(num_items);
thrust::device_vector<DataType> d_values(num_items);
// ...
// Initialize iterator
using KeyIterator = typename thrust::device_vector<KeyType>::iterator;
thrust::reverse_iterator<KeyIterator> reverse_iter(d_keys.end());
// Determine temporary device storage requirements
std::size_t temp_storage_bytes = 0;
nullptr,
temp_storage_bytes,
reverse_iter,
thrust::raw_pointer_cast(d_values.data()),
num_items,
CustomLess());
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run sorting operation
d_temp_storage,
temp_storage_bytes,
reverse_iter,
thrust::raw_pointer_cast(d_values.data()),
num_items,
CustomLess());

Static Public Methods

template<typename KeyIteratorT , typename ValueIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION
cudaError_t 
SortPairs (void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorT d_keys, ValueIteratorT d_items, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream=0, bool debug_synchronous=false)
 Sorts items using a merge sorting method. More...
 
template<typename KeyInputIteratorT , typename ValueInputIteratorT , typename KeyIteratorT , typename ValueIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION
cudaError_t 
SortPairsCopy (void *d_temp_storage, std::size_t &temp_storage_bytes, KeyInputIteratorT d_input_keys, ValueInputIteratorT d_input_items, KeyIteratorT d_output_keys, ValueIteratorT d_output_items, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream=0, bool debug_synchronous=false)
 Sorts items using a merge sorting method. More...
 
template<typename KeyIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION
cudaError_t 
SortKeys (void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorT d_keys, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream=0, bool debug_synchronous=false)
 Sorts items using a merge sorting method. More...
 
template<typename KeyInputIteratorT , typename KeyIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION
cudaError_t 
SortKeysCopy (void *d_temp_storage, std::size_t &temp_storage_bytes, KeyInputIteratorT d_input_keys, KeyIteratorT d_output_keys, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream=0, bool debug_synchronous=false)
 Sorts items using a merge sorting method. More...
 
template<typename KeyIteratorT , typename ValueIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION
cudaError_t 
StableSortPairs (void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorT d_keys, ValueIteratorT d_items, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream=0, bool debug_synchronous=false)
 Sorts items using a merge sorting method. More...
 
template<typename KeyIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION
cudaError_t 
StableSortKeys (void *d_temp_storage, std::size_t &temp_storage_bytes, KeyIteratorT d_keys, OffsetT num_items, CompareOpT compare_op, cudaStream_t stream=0, bool debug_synchronous=false)
 Sorts items using a merge sorting method. More...
 

Member Function Documentation

template<typename KeyIteratorT , typename ValueIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceMergeSort::SortPairs ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
KeyIteratorT  d_keys,
ValueIteratorT  d_items,
OffsetT  num_items,
CompareOpT  compare_op,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Sorts items using a merge sorting method.

SortPairs is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
Snippet
The code snippet below illustrates the sorting of a device vector of int keys with associated vector of int values.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_merge_sort.cuh>
// Declare, allocate, and initialize device-accessible pointers for
// sorting data
int num_items; // e.g., 7
int *d_keys; // e.g., [8, 6, 6, 5, 3, 0, 9]
int *d_values; // e.g., [0, 1, 2, 3, 4, 5, 6]
...
// Initialize comparator
CustomOpT custom_op;
// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
std::size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes,
d_keys, d_values, num_items, custom_op);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run sorting operation
d_temp_storage, temp_storage_bytes,
d_keys, d_values, num_items, custom_op);
// d_keys <-- [0, 3, 5, 6, 6, 8, 9]
// d_values <-- [5, 4, 3, 2, 1, 0, 6]
Template Parameters
KeyIteratorTis a model of Random Access Iterator. KeyIteratorT is mutable, and its value_type is a model of LessThan Comparable. This value_type's ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.
ValueIteratorTis a model of Random Access Iterator, and ValueIteratorT is mutable.
OffsetTis an integer type for global offsets.
CompareOpTis a type of callable object with the signature bool operator()(KeyT lhs, KeyT rhs) that models the Strict Weak Ordering concept.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in,out]d_keysPointer to the input sequence of unsorted input keys
[in,out]d_itemsPointer to the input sequence of unsorted input values
[in]num_itemsNumber of items to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]stream**[optional]** CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous**[optional]** Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false.
template<typename KeyInputIteratorT , typename ValueInputIteratorT , typename KeyIteratorT , typename ValueIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceMergeSort::SortPairsCopy ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
KeyInputIteratorT  d_input_keys,
ValueInputIteratorT  d_input_items,
KeyIteratorT  d_output_keys,
ValueIteratorT  d_output_items,
OffsetT  num_items,
CompareOpT  compare_op,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Sorts items using a merge sorting method.

  • SortPairsCopy is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
  • Input arrays d_input_keys and d_input_items are not modified.
  • Note that the behavior is undefined if the input and output ranges overlap in any way.
Snippet
The code snippet below illustrates the sorting of a device vector of int keys with associated vector of int values.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_merge_sort.cuh>
// Declare, allocate, and initialize device-accessible pointers
// for sorting data
int num_items; // e.g., 7
int *d_keys; // e.g., [8, 6, 6, 5, 3, 0, 9]
int *d_values; // e.g., [0, 1, 2, 3, 4, 5, 6]
...
// Initialize comparator
CustomOpT custom_op;
// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
std::size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes,
d_keys, d_values, num_items, custom_op);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run sorting operation
d_temp_storage, temp_storage_bytes,
d_keys, d_values, num_items, custom_op);
// d_keys <-- [0, 3, 5, 6, 6, 8, 9]
// d_values <-- [5, 4, 3, 2, 1, 0, 6]
Template Parameters
KeyInputIteratorTis a model of Random Access Iterator. Its value_type is a model of LessThan Comparable. This value_type's ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.
ValueInputIteratorTis a model of Random Access Iterator.
KeyIteratorTis a model of Random Access Iterator. KeyIteratorT is mutable, and its value_type is a model of LessThan Comparable. This value_type's ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.
ValueIteratorTis a model of Random Access Iterator, and ValueIteratorT is mutable.
OffsetTis an integer type for global offsets.
CompareOpTis a type of callable object with the signature bool operator()(KeyT lhs, KeyT rhs) that models the Strict Weak Ordering concept.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_input_keysPointer to the input sequence of unsorted input keys
[in]d_input_itemsPointer to the input sequence of unsorted input values
[out]d_output_keysPointer to the output sequence of sorted input keys
[out]d_output_itemsPointer to the output sequence of sorted input values
[in]num_itemsNumber of items to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]stream**[optional]** CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous**[optional]** Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false.
template<typename KeyIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceMergeSort::SortKeys ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
KeyIteratorT  d_keys,
OffsetT  num_items,
CompareOpT  compare_op,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Sorts items using a merge sorting method.

SortKeys is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
Snippet
The code snippet below illustrates the sorting of a device vector of int keys.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_merge_sort.cuh>
// Declare, allocate, and initialize device-accessible pointers
// for sorting data
int num_items; // e.g., 7
int *d_keys; // e.g., [8, 6, 7, 5, 3, 0, 9]
...
// Initialize comparator
CustomOpT custom_op;
// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
std::size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes,
d_keys, num_items, custom_op);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run sorting operation
d_temp_storage, temp_storage_bytes,
d_keys, num_items, custom_op);
// d_keys <-- [0, 3, 5, 6, 7, 8, 9]
Template Parameters
KeyIteratorTis a model of Random Access Iterator. KeyIteratorT is mutable, and its value_type is a model of LessThan Comparable. This value_type's ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.
OffsetTis an integer type for global offsets.
CompareOpTis a type of callable object with the signature bool operator()(KeyT lhs, KeyT rhs) that models the Strict Weak Ordering concept.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in,out]d_keysPointer to the input sequence of unsorted input keys
[in]num_itemsNumber of items to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]stream**[optional]** CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous**[optional]** Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false.
template<typename KeyInputIteratorT , typename KeyIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceMergeSort::SortKeysCopy ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
KeyInputIteratorT  d_input_keys,
KeyIteratorT  d_output_keys,
OffsetT  num_items,
CompareOpT  compare_op,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Sorts items using a merge sorting method.

  • SortKeysCopy is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
  • Input array d_input_keys is not modified.
  • Note that the behavior is undefined if the input and output ranges overlap in any way.
Snippet
The code snippet below illustrates the sorting of a device vector of int keys.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_merge_sort.cuh>
// Declare, allocate, and initialize device-accessible pointers for
// sorting data
int num_items; // e.g., 7
int *d_keys; // e.g., [8, 6, 7, 5, 3, 0, 9]
...
// Initialize comparator
CustomOpT custom_op;
// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
std::size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes,
d_keys, num_items, custom_op);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run sorting operation
d_temp_storage, temp_storage_bytes,
d_keys, num_items, custom_op);
// d_keys <-- [0, 3, 5, 6, 7, 8, 9]
Template Parameters
KeyInputIteratorTis a model of Random Access Iterator. Its value_type is a model of LessThan Comparable. This value_type's ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.
KeyIteratorTis a model of Random Access Iterator. KeyIteratorT is mutable, and its value_type is a model of LessThan Comparable. This value_type's ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.
OffsetTis an integer type for global offsets.
CompareOpTis a type of callable object with the signature bool operator()(KeyT lhs, KeyT rhs) that models the Strict Weak Ordering concept.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_input_keysPointer to the input sequence of unsorted input keys
[out]d_output_keysPointer to the output sequence of sorted input keys
[in]num_itemsNumber of items to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]stream**[optional]** CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous**[optional]** Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false.
template<typename KeyIteratorT , typename ValueIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceMergeSort::StableSortPairs ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
KeyIteratorT  d_keys,
ValueIteratorT  d_items,
OffsetT  num_items,
CompareOpT  compare_op,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Sorts items using a merge sorting method.

StableSortPairs is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of stable_sort is that x still precedes y.
Snippet
The code snippet below illustrates the sorting of a device vector of int keys with associated vector of int values.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_merge_sort.cuh>
// Declare, allocate, and initialize device-accessible pointers for
// sorting data
int num_items; // e.g., 7
int *d_keys; // e.g., [8, 6, 6, 5, 3, 0, 9]
int *d_values; // e.g., [0, 1, 2, 3, 4, 5, 6]
...
// Initialize comparator
CustomOpT custom_op;
// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
std::size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes,
d_keys, d_values, num_items, custom_op);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run sorting operation
d_temp_storage, temp_storage_bytes,
d_keys, d_values, num_items, custom_op);
// d_keys <-- [0, 3, 5, 6, 6, 8, 9]
// d_values <-- [5, 4, 3, 1, 2, 0, 6]
Template Parameters
KeyIteratorTis a model of Random Access Iterator. KeyIteratorT is mutable, and its value_type is a model of LessThan Comparable. This value_type's ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.
ValueIteratorTis a model of Random Access Iterator, and ValueIteratorT is mutable.
OffsetTis an integer type for global offsets.
CompareOpTis a type of callable object with the signature bool operator()(KeyT lhs, KeyT rhs) that models the Strict Weak Ordering concept.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in,out]d_keysPointer to the input sequence of unsorted input keys
[in,out]d_itemsPointer to the input sequence of unsorted input values
[in]num_itemsNumber of items to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]stream**[optional]** CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous**[optional]** Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false.
template<typename KeyIteratorT , typename OffsetT , typename CompareOpT >
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceMergeSort::StableSortKeys ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
KeyIteratorT  d_keys,
OffsetT  num_items,
CompareOpT  compare_op,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Sorts items using a merge sorting method.

StableSortKeys is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of stable_sort is that x still precedes y.
Snippet
The code snippet below illustrates the sorting of a device vector of int keys.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_merge_sort.cuh>
// Declare, allocate, and initialize device-accessible pointers for
// sorting data
int num_items; // e.g., 7
int *d_keys; // e.g., [8, 6, 7, 5, 3, 0, 9]
...
// Initialize comparator
CustomOpT custom_op;
// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
std::size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes,
d_keys, num_items, custom_op);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run sorting operation
d_temp_storage, temp_storage_bytes,
d_keys, num_items, custom_op);
// d_keys <-- [0, 3, 5, 6, 7, 8, 9]
Template Parameters
KeyIteratorTis a model of Random Access Iterator. KeyIteratorT is mutable, and its value_type is a model of LessThan Comparable. This value_type's ordering relation is a strict weak ordering as defined in the LessThan Comparable requirements.
OffsetTis an integer type for global offsets.
CompareOpTis a type of callable object with the signature bool operator()(KeyT lhs, KeyT rhs) that models the Strict Weak Ordering concept.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in,out]d_keysPointer to the input sequence of unsorted input keys
[in]num_itemsNumber of items to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]stream**[optional]** CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous**[optional]** Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false.

The documentation for this struct was generated from the following file: