CUB  
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Classes | Functions

Classes

struct  cub::DeviceAdjacentDifference
 DeviceAdjacentDifference provides device-wide, parallel operations for computing the differences of adjacent elements residing within device-accessible memory. More...
 
struct  cub::DeviceHistogram
 DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within device-accessible memory.

histogram_logo.png
.
More...
 
struct  cub::DeviceMergeSort
 DeviceMergeSort provides device-wide, parallel operations for computing a merge sort across a sequence of data items residing within device-accessible memory. More...
 
struct  cub::DevicePartition
 DevicePartition provides device-wide, parallel operations for partitioning sequences of data items residing within device-accessible memory.

partition_logo.png
.
More...
 
struct  cub::DeviceRadixSort
 DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within device-accessible memory.

sorting_logo.png
.
More...
 
struct  cub::DeviceReduce
 DeviceReduce provides device-wide, parallel operations for computing a reduction across a sequence of data items residing within device-accessible memory.

reduce_logo.png
.
More...
 
struct  cub::DeviceRunLengthEncode
 DeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within device-accessible memory.

run_length_encode_logo.png
.
More...
 
struct  cub::DeviceScan
 DeviceScan provides device-wide, parallel operations for computing a prefix scan across a sequence of data items residing within device-accessible memory.

device_scan.png
.
More...
 
struct  cub::DeviceSelect
 DeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within device-accessible memory.

select_logo.png
.
More...
 
struct  cub::DeviceSpmv
 DeviceSpmv provides device-wide parallel operations for performing sparse-matrix * dense-vector multiplication (SpMV). More...
 

Functions

template<typename InputIteratorT , typename OutputIteratorT , typename DifferenceOpT = cub::Difference>
static CUB_RUNTIME_FUNCTION
cudaError_t 
cub::DeviceAdjacentDifference::SubtractLeftCopy (void *d_temp_storage, std::size_t &temp_storage_bytes, InputIteratorT d_input, OutputIteratorT d_output, std::size_t num_items, DifferenceOpT difference_op={}, cudaStream_t stream=0, bool debug_synchronous=false)
 Subtracts the left element of each adjacent pair of elements residing within device-accessible memory. More...
 
template<typename RandomAccessIteratorT , typename DifferenceOpT = cub::Difference>
static CUB_RUNTIME_FUNCTION
cudaError_t 
cub::DeviceAdjacentDifference::SubtractLeft (void *d_temp_storage, std::size_t &temp_storage_bytes, RandomAccessIteratorT d_input, std::size_t num_items, DifferenceOpT difference_op={}, cudaStream_t stream=0, bool debug_synchronous=false)
 Subtracts the left element of each adjacent pair of elements residing within device-accessible memory. More...
 
template<typename InputIteratorT , typename OutputIteratorT , typename DifferenceOpT = cub::Difference>
static CUB_RUNTIME_FUNCTION
cudaError_t 
cub::DeviceAdjacentDifference::SubtractRightCopy (void *d_temp_storage, std::size_t &temp_storage_bytes, InputIteratorT d_input, OutputIteratorT d_output, std::size_t num_items, DifferenceOpT difference_op={}, cudaStream_t stream=0, bool debug_synchronous=false)
 Subtracts the right element of each adjacent pair of elements residing within device-accessible memory. More...
 
template<typename RandomAccessIteratorT , typename DifferenceOpT = cub::Difference>
static CUB_RUNTIME_FUNCTION
cudaError_t 
cub::DeviceAdjacentDifference::SubtractRight (void *d_temp_storage, std::size_t &temp_storage_bytes, RandomAccessIteratorT d_input, std::size_t num_items, DifferenceOpT difference_op={}, cudaStream_t stream=0, bool debug_synchronous=false)
 Subtracts the right element of each adjacent pair of elements residing within device-accessible memory. More...
 

Function Documentation

template<typename InputIteratorT , typename OutputIteratorT , typename DifferenceOpT = cub::Difference>
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceAdjacentDifference::SubtractLeftCopy ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
InputIteratorT  d_input,
OutputIteratorT  d_output,
std::size_t  num_items,
DifferenceOpT  difference_op = {},
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Subtracts the left element of each adjacent pair of elements residing within device-accessible memory.

Overview
  • Calculates the differences of adjacent elements in d_input. That is, *d_input is assigned to *d_output, and, for each iterator i in the range [d_input + 1, d_input + num_items), the result of difference_op(*i, *(i - 1)) is assigned to *(d_output + (i - d_input)).
  • Note that the behavior is undefined if the input and output ranges overlap in any way.
Snippet
The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
// Declare, allocate, and initialize device-accessible pointers
int num_items; // e.g., 8
int *d_input; // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
int *d_output;
...
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes,
d_input, d_output,
num_items, CustomDifference());
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run operation
d_temp_storage, temp_storage_bytes,
d_input, d_output,
num_items, CustomDifference());
// d_input <-- [1, 2, 1, 2, 1, 2, 1, 2]
// d_output <-- [1, 1, -1, 1, -1, 1, -1, 1]
Template Parameters
InputIteratorTis a model of Input Iterator, and x and y are objects of InputIteratorT's value_type, then x - y is defined, and InputIteratorT's value_type is convertible to a type in OutputIteratorT's set of value_types, and the return type of x - y is convertible to a type in OutputIteratorT's set of value_types.
OutputIteratorTis a model of Output Iterator.
DifferenceOpTIts result_type is convertible to a type in OutputIteratorT's set of value_types.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_inputPointer to the input sequence
[out]d_outputPointer to the output sequence
[in]num_itemsNumber of items in the input sequence
[in]difference_opThe binary function used to compute differences
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false
template<typename RandomAccessIteratorT , typename DifferenceOpT = cub::Difference>
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceAdjacentDifference::SubtractLeft ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
RandomAccessIteratorT  d_input,
std::size_t  num_items,
DifferenceOpT  difference_op = {},
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Subtracts the left element of each adjacent pair of elements residing within device-accessible memory.

Overview
Calculates the differences of adjacent elements in d_input. That is, for each iterator i in the range [d_input + 1, d_input + num_items), the result of difference_op(*i, *(i - 1)) is assigned to *(d_input + (i - d_input)).
Snippet
The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
// Declare, allocate, and initialize device-accessible pointers
int num_items; // e.g., 8
int *d_data; // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
...
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes,
d_data, num_items, CustomDifference());
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run operation
d_temp_storage, temp_storage_bytes,
d_data, num_items, CustomDifference());
// d_data <-- [1, 1, -1, 1, -1, 1, -1, 1]
Template Parameters
RandomAccessIteratorTis a model of Random Access Iterator, RandomAccessIteratorT is mutable. If x and y are objects of RandomAccessIteratorT's value_type, and x - y is defined, then the return type of x - y should be convertible to a type in RandomAccessIteratorT's set of value_types.
DifferenceOpTIts result_type is convertible to a type in RandomAccessIteratorT's set of value_types.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in,out]d_inputPointer to the input sequence and the result
[in]num_itemsNumber of items in the input sequence
[in]difference_opThe binary function used to compute differences
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false.
template<typename InputIteratorT , typename OutputIteratorT , typename DifferenceOpT = cub::Difference>
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceAdjacentDifference::SubtractRightCopy ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
InputIteratorT  d_input,
OutputIteratorT  d_output,
std::size_t  num_items,
DifferenceOpT  difference_op = {},
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Subtracts the right element of each adjacent pair of elements residing within device-accessible memory.

Overview
  • Calculates the right differences of adjacent elements in d_input. That is, *(d_input + num_items - 1) is assigned to *(d_output + num_items - 1), and, for each iterator i in the range [d_input, d_input + num_items - 1), the result of difference_op(*i, *(i + 1)) is assigned to *(d_output + (i - d_input)).
  • Note that the behavior is undefined if the input and output ranges overlap in any way.
Snippet
The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
// Declare, allocate, and initialize device-accessible pointers
int num_items; // e.g., 8
int *d_input; // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
int *d_output;
..
// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes,
d_input, d_output, num_items, CustomDifference());
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run operation
d_temp_storage, temp_storage_bytes,
d_input, d_output, num_items, CustomDifference());
// d_input <-- [1, 2, 1, 2, 1, 2, 1, 2]
// d_data <-- [-1, 1, -1, 1, -1, 1, -1, 2]
Template Parameters
InputIteratorTis a model of Input Iterator, and x and y are objects of InputIteratorT's value_type, then x - y is defined, and InputIteratorT's value_type is convertible to a type in OutputIteratorT's set of value_types, and the return type of x - y is convertible to a type in OutputIteratorT's set of value_types.
OutputIteratorTis a model of Output Iterator.
DifferenceOpTIts result_type is convertible to a type in RandomAccessIteratorT's set of value_types.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_inputPointer to the input sequence
[out]d_outputPointer to the output sequence
[in]num_itemsNumber of items in the input sequence
[in]difference_opThe binary function used to compute differences.
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false.
template<typename RandomAccessIteratorT , typename DifferenceOpT = cub::Difference>
static CUB_RUNTIME_FUNCTION cudaError_t cub::DeviceAdjacentDifference::SubtractRight ( void *  d_temp_storage,
std::size_t &  temp_storage_bytes,
RandomAccessIteratorT  d_input,
std::size_t  num_items,
DifferenceOpT  difference_op = {},
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Subtracts the right element of each adjacent pair of elements residing within device-accessible memory.

Overview
Calculates the right differences of adjacent elements in d_input. That is, for each iterator i in the range [d_input, d_input + num_items - 1), the result of difference_op(*i, *(i + 1)) is assigned to *(d_input + (i - d_input)).
Snippet
The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>
// Declare, allocate, and initialize device-accessible pointers
int num_items; // e.g., 8
int *d_data; // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
...
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
d_temp_storage, temp_storage_bytes, d_data, num_items);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run operation
d_temp_storage, temp_storage_bytes, d_data, num_items);
// d_data <-- [-1, 1, -1, 1, -1, 1, -1, 2]
Template Parameters
RandomAccessIteratorTis a model of Random Access Iterator, RandomAccessIteratorT is mutable. If x and y are objects of RandomAccessIteratorT's value_type, and x - y is defined, then the return type of x - y should be convertible to a type in RandomAccessIteratorT's set of value_types.
DifferenceOpTIts result_type is convertible to a type in RandomAccessIteratorT's set of value_types.
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in,out]d_inputPointer to the input sequence
[in]num_itemsNumber of items in the input sequence
[in]difference_opThe binary function used to compute differences
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false.