CUB
|
DeviceSegmentedSort provides device-wide, parallel operations for computing a batched sort across multiple, non-overlapping sequences of data items residing within device-accessible memory.
unsigned char
, int
, double
, etc.) as well as CUDA's __half
and __nv_bfloat16
16-bit floating-point types.Static Public Methods | |
Keys-only | |
template<typename KeyT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeys (void *d_temp_storage, std::size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into ascending order. Approximately num_items + 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeysDescending (void *d_temp_storage, std::size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into descending order. Approximately num_items + 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeys (void *d_temp_storage, std::size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into ascending order. Approximately 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeysDescending (void *d_temp_storage, std::size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into descending order. Approximately 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | StableSortKeys (void *d_temp_storage, std::size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into ascending order. Approximately num_items + 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | StableSortKeysDescending (void *d_temp_storage, std::size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into descending order. Approximately num_items + 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | StableSortKeys (void *d_temp_storage, std::size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into ascending order. Approximately 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | StableSortKeysDescending (void *d_temp_storage, std::size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into descending order. Approximately 2*num_segments auxiliary storage required. More... | |
Key-value pairs | |
template<typename KeyT , typename ValueT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairs (void *d_temp_storage, std::size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into ascending order. Approximately 2*num_items + 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename ValueT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairsDescending (void *d_temp_storage, std::size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into descending order. Approximately 2*num_items + 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename ValueT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairs (void *d_temp_storage, std::size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, DoubleBuffer< ValueT > &d_values, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into ascending order. Approximately 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename ValueT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairsDescending (void *d_temp_storage, std::size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, DoubleBuffer< ValueT > &d_values, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into descending order. Approximately 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename ValueT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | StableSortPairs (void *d_temp_storage, std::size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into ascending order. Approximately 2*num_items + 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename ValueT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | StableSortPairsDescending (void *d_temp_storage, std::size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into descending order. Approximately 2*num_items + 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename ValueT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | StableSortPairs (void *d_temp_storage, std::size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, DoubleBuffer< ValueT > &d_values, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into ascending order. Approximately 2*num_segments auxiliary storage required. More... | |
template<typename KeyT , typename ValueT , typename BeginOffsetIteratorT , typename EndOffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | StableSortPairsDescending (void *d_temp_storage, std::size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, DoubleBuffer< ValueT > &d_values, int num_items, int num_segments, BeginOffsetIteratorT d_begin_offsets, EndOffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into descending order. Approximately 2*num_segments auxiliary storage required. More... | |
|
inlinestatic |
Sorts segments of keys into ascending order. Approximately num_items + 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).i
and j
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.int
keys.KeyT | [inferred] Key type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i] - 1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i] - 1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of keys into descending order. Approximately num_items + 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments + 1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets + 1
).i
and j
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.int
keys.KeyT | [inferred] Key type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i] - 1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i] - 1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of keys into ascending order. Approximately 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).i
and j
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.int
keys.KeyT | [inferred] Key type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i] - 1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i] - 1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of keys into descending order. Approximately 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments + 1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets + 1
).i
and j
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.int
keys.KeyT | [inferred] Key type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i] - 1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i] - 1<= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of keys into ascending order. Approximately num_items + 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).x
and y
are elements such that x
precedes y
, and if the two elements are equivalent (neither x
< y
nor y
< x
) then a postcondition of stable sort is that x
still precedes y
.int
keys.KeyT | [inferred] Key type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of keys into descending order. Approximately num_items + 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).x
and y
are elements such that x
precedes y
, and if the two elements are equivalent (neither x
< y
nor y
< x
) then a postcondition of stable sort is that x
still precedes y
.int
keys.KeyT | [inferred] Key type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of keys into ascending order. Approximately 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).x
and y
are elements such that x
precedes y
, and if the two elements are equivalent (neither x
< y
nor y
< x
) then a postcondition of stable sort is that x
still precedes y
.int
keys.KeyT | [inferred] Key type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i] - 1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i] - 1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of keys into descending order. Approximately 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).x
and y
are elements such that x
precedes y
, and if the two elements are equivalent (neither x
< y
nor y
< x
) then a postcondition of stable sort is that x
still precedes y
.int
keys.KeyT | [inferred] Key type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of key-value pairs into ascending order. Approximately 2*num_items + 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).i
and j
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.int
keys with associated vector of int
values.KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | d_values_in | Device-accessible pointer to the corresponding input sequence of associated value items |
[out] | d_values_out | Device-accessible pointer to the correspondingly-reordered output sequence of associated value items |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of key-value pairs into descending order. Approximately 2*num_items + 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).i
and j
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.int
keys with associated vector of int
values.KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | d_values_in | Device-accessible pointer to the corresponding input sequence of associated value items |
[out] | d_values_out | Device-accessible pointer to the correspondingly-reordered output sequence of associated value items |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of key-value pairs into ascending order. Approximately 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).i
and j
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.int
keys with associated vector of int
values.KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in,out] | d_values | Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of key-value pairs into descending order. Approximately 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).i
and j
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.int
keys with associated vector of int
values.KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in,out] | d_values | Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of key-value pairs into ascending order. Approximately 2*num_items + 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).x
and y
are elements such that x
precedes y
, and if the two elements are equivalent (neither x
< y
nor y
< x
) then a postcondition of stable sort is that x
still precedes y
.int
keys with associated vector of int
values.KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | d_values_in | Device-accessible pointer to the corresponding input sequence of associated value items |
[out] | d_values_out | Device-accessible pointer to the correspondingly-reordered output sequence of associated value items |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of key-value pairs into descending order. Approximately 2*num_items + 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).x
and y
are elements such that x
precedes y
, and if the two elements are equivalent (neither x
< y
nor y
< x
) then a postcondition of stable sort is that x
still precedes y
.int
keys with associated vector of int
values.KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | d_values_in | Device-accessible pointer to the corresponding input sequence of associated value items |
[out] | d_values_out | Device-accessible pointer to the correspondingly-reordered output sequence of associated value items |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of key-value pairs into ascending order. Approximately 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).x
and y
are elements such that x
precedes y
, and if the two elements are equivalent (neither x
< y
nor y
< x
) then a postcondition of stable sort is that x
still precedes y
.int
keys with associated vector of int
values.KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in,out] | d_values | Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Sorts segments of key-value pairs into descending order. Approximately 2*num_segments
auxiliary storage required.
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).x
and y
are elements such that x
precedes y
, and if the two elements are equivalent (neither x
< y
nor y
< x
) then a postcondition of stable sort is that x
still precedes y
.int
keys with associated vector of int
values.KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
BeginOffsetIteratorT | [inferred] Random-access input iterator type for reading segment beginning offsets (may be a simple pointer type) |
EndOffsetIteratorT | [inferred] Random-access input iterator type for reading segment ending offsets (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in,out] | d_values | Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Random-access input iterator to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Random-access input iterator to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the i-th segment is considered empty. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |