CUB
|
Classes | |
struct | cub::DeviceAdjacentDifference |
DeviceAdjacentDifference provides device-wide, parallel operations for computing the differences of adjacent elements residing within device-accessible memory. More... | |
struct | cub::DeviceHistogram |
DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within device-accessible memory.
![]()
. | |
struct | cub::DeviceMergeSort |
DeviceMergeSort provides device-wide, parallel operations for computing a merge sort across a sequence of data items residing within device-accessible memory. More... | |
struct | cub::DevicePartition |
DevicePartition provides device-wide, parallel operations for partitioning sequences of data items residing within device-accessible memory.
![]()
. | |
struct | cub::DeviceRadixSort |
DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within device-accessible memory.
![]()
. | |
struct | cub::DeviceReduce |
DeviceReduce provides device-wide, parallel operations for computing a reduction across a sequence of data items residing within device-accessible memory.
![]()
. | |
struct | cub::DeviceRunLengthEncode |
DeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within device-accessible memory.
![]()
. | |
struct | cub::DeviceScan |
DeviceScan provides device-wide, parallel operations for computing a prefix scan across a sequence of data items residing within device-accessible memory.
![]()
. | |
struct | cub::DeviceSelect |
DeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within device-accessible memory.
![]()
. | |
struct | cub::DeviceSpmv |
DeviceSpmv provides device-wide parallel operations for performing sparse-matrix * dense-vector multiplication (SpMV). More... | |
Functions | |
template<typename InputIteratorT , typename OutputIteratorT , typename DifferenceOpT = cub::Difference> | |
static CUB_RUNTIME_FUNCTION cudaError_t | cub::DeviceAdjacentDifference::SubtractLeftCopy (void *d_temp_storage, std::size_t &temp_storage_bytes, InputIteratorT d_input, OutputIteratorT d_output, std::size_t num_items, DifferenceOpT difference_op={}, cudaStream_t stream=0, bool debug_synchronous=false) |
Subtracts the left element of each adjacent pair of elements residing within device-accessible memory. More... | |
template<typename RandomAccessIteratorT , typename DifferenceOpT = cub::Difference> | |
static CUB_RUNTIME_FUNCTION cudaError_t | cub::DeviceAdjacentDifference::SubtractLeft (void *d_temp_storage, std::size_t &temp_storage_bytes, RandomAccessIteratorT d_input, std::size_t num_items, DifferenceOpT difference_op={}, cudaStream_t stream=0, bool debug_synchronous=false) |
Subtracts the left element of each adjacent pair of elements residing within device-accessible memory. More... | |
template<typename InputIteratorT , typename OutputIteratorT , typename DifferenceOpT = cub::Difference> | |
static CUB_RUNTIME_FUNCTION cudaError_t | cub::DeviceAdjacentDifference::SubtractRightCopy (void *d_temp_storage, std::size_t &temp_storage_bytes, InputIteratorT d_input, OutputIteratorT d_output, std::size_t num_items, DifferenceOpT difference_op={}, cudaStream_t stream=0, bool debug_synchronous=false) |
Subtracts the right element of each adjacent pair of elements residing within device-accessible memory. More... | |
template<typename RandomAccessIteratorT , typename DifferenceOpT = cub::Difference> | |
static CUB_RUNTIME_FUNCTION cudaError_t | cub::DeviceAdjacentDifference::SubtractRight (void *d_temp_storage, std::size_t &temp_storage_bytes, RandomAccessIteratorT d_input, std::size_t num_items, DifferenceOpT difference_op={}, cudaStream_t stream=0, bool debug_synchronous=false) |
Subtracts the right element of each adjacent pair of elements residing within device-accessible memory. More... | |
|
inlinestatic |
Subtracts the left element of each adjacent pair of elements residing within device-accessible memory.
d_input
. That is, *d_input
is assigned to *d_output
, and, for each iterator i
in the range [d_input + 1, d_input + num_items)
, the result of difference_op(*i, *(i - 1))
is assigned to *(d_output + (i - d_input))
.DeviceAdjacentDifference
to compute the difference between adjacent elements.InputIteratorT | is a model of Input Iterator, and x and y are objects of InputIteratorT 's value_type , then x - y is defined, and InputIteratorT 's value_type is convertible to a type in OutputIteratorT 's set of value_types , and the return type of x - y is convertible to a type in OutputIteratorT 's set of value_types . |
OutputIteratorT | is a model of Output Iterator. |
DifferenceOpT | Its result_type is convertible to a type in OutputIteratorT 's set of value_types . |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_input | Pointer to the input sequence |
[out] | d_output | Pointer to the output sequence |
[in] | num_items | Number of items in the input sequence |
[in] | difference_op | The binary function used to compute differences |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0 |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false |
|
inlinestatic |
Subtracts the left element of each adjacent pair of elements residing within device-accessible memory.
d_input
. That is, for each iterator i
in the range [d_input + 1, d_input + num_items)
, the result of difference_op(*i, *(i - 1))
is assigned to *(d_input + (i - d_input))
.DeviceAdjacentDifference
to compute the difference between adjacent elements.RandomAccessIteratorT | is a model of Random Access Iterator, RandomAccessIteratorT is mutable. If x and y are objects of RandomAccessIteratorT 's value_type , and x - y is defined, then the return type of x - y should be convertible to a type in RandomAccessIteratorT 's set of value_types . |
DifferenceOpT | Its result_type is convertible to a type in RandomAccessIteratorT 's set of value_types . |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_input | Pointer to the input sequence and the result |
[in] | num_items | Number of items in the input sequence |
[in] | difference_op | The binary function used to compute differences |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Subtracts the right element of each adjacent pair of elements residing within device-accessible memory.
d_input
. That is, *(d_input + num_items - 1)
is assigned to *(d_output + num_items - 1)
, and, for each iterator i
in the range [d_input, d_input + num_items - 1)
, the result of difference_op(*i, *(i + 1))
is assigned to *(d_output + (i - d_input))
.DeviceAdjacentDifference
to compute the difference between adjacent elements.InputIteratorT | is a model of Input Iterator, and x and y are objects of InputIteratorT 's value_type , then x - y is defined, and InputIteratorT 's value_type is convertible to a type in OutputIteratorT 's set of value_types , and the return type of x - y is convertible to a type in OutputIteratorT 's set of value_types . |
OutputIteratorT | is a model of Output Iterator. |
DifferenceOpT | Its result_type is convertible to a type in RandomAccessIteratorT 's set of value_types . |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_input | Pointer to the input sequence |
[out] | d_output | Pointer to the output sequence |
[in] | num_items | Number of items in the input sequence |
[in] | difference_op | The binary function used to compute differences. |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
|
inlinestatic |
Subtracts the right element of each adjacent pair of elements residing within device-accessible memory.
d_input
. That is, for each iterator i
in the range [d_input, d_input + num_items - 1)
, the result of difference_op(*i, *(i + 1))
is assigned to *(d_input + (i - d_input))
.DeviceAdjacentDifference
to compute the difference between adjacent elements.RandomAccessIteratorT | is a model of Random Access Iterator, RandomAccessIteratorT is mutable. If x and y are objects of RandomAccessIteratorT 's value_type , and x - y is defined, then the return type of x - y should be convertible to a type in RandomAccessIteratorT 's set of value_types . |
DifferenceOpT | Its result_type is convertible to a type in RandomAccessIteratorT 's set of value_types . |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When nullptr , the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_input | Pointer to the input sequence |
[in] | num_items | Number of items in the input sequence |
[in] | difference_op | The binary function used to compute differences |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |