CUB
|
DeviceScan provides device-wide, parallel operations for computing a prefix scan across a sequence of data items residing within device-accessible memory.
int32
keys. Performance plots for other scenarios can be found in the detailed method descriptions below.Static Public Methods | |
Exclusive scans | |
template<typename InputIteratorT , typename OutputIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | ExclusiveSum (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false) |
Computes a device-wide exclusive prefix sum. The value of 0 is applied as the initial value, and is assigned to *d_out. More... | |
template<typename InputIteratorT , typename OutputIteratorT , typename ScanOpT , typename InitValueT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | ExclusiveScan (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, ScanOpT scan_op, InitValueT init_value, int num_items, cudaStream_t stream=0, bool debug_synchronous=false) |
Computes a device-wide exclusive prefix scan using the specified binary scan_op functor. The init_value value is applied as the initial value, and is assigned to *d_out. More... | |
template<typename InputIteratorT , typename OutputIteratorT , typename ScanOpT , typename InitValueT , typename InitValueIterT = InitValueT*> | |
static CUB_RUNTIME_FUNCTION cudaError_t | ExclusiveScan (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, ScanOpT scan_op, FutureValue< InitValueT, InitValueIterT > init_value, int num_items, cudaStream_t stream=0, bool debug_synchronous=false) |
Inclusive scans | |
template<typename InputIteratorT , typename OutputIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | InclusiveSum (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false) |
Computes a device-wide inclusive prefix sum. More... | |
template<typename InputIteratorT , typename OutputIteratorT , typename ScanOpT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | InclusiveScan (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, ScanOpT scan_op, int num_items, cudaStream_t stream=0, bool debug_synchronous=false) |
Computes a device-wide inclusive prefix scan using the specified binary scan_op functor. More... | |
template<typename KeysInputIteratorT , typename ValuesInputIteratorT , typename ValuesOutputIteratorT , typename EqualityOpT = Equality> | |
static CUB_RUNTIME_FUNCTION cudaError_t | ExclusiveSumByKey (void *d_temp_storage, size_t &temp_storage_bytes, KeysInputIteratorT d_keys_in, ValuesInputIteratorT d_values_in, ValuesOutputIteratorT d_values_out, int num_items, EqualityOpT equality_op=EqualityOpT(), cudaStream_t stream=0, bool debug_synchronous=false) |
Computes a device-wide exclusive prefix sum-by-key with key equality defined by equality_op . The value of 0 is applied as the initial value, and is assigned to the beginning of each segment in d_values_out . More... | |
template<typename KeysInputIteratorT , typename ValuesInputIteratorT , typename ValuesOutputIteratorT , typename ScanOpT , typename InitValueT , typename EqualityOpT = Equality> | |
static CUB_RUNTIME_FUNCTION cudaError_t | ExclusiveScanByKey (void *d_temp_storage, size_t &temp_storage_bytes, KeysInputIteratorT d_keys_in, ValuesInputIteratorT d_values_in, ValuesOutputIteratorT d_values_out, ScanOpT scan_op, InitValueT init_value, int num_items, EqualityOpT equality_op=EqualityOpT(), cudaStream_t stream=0, bool debug_synchronous=false) |
Computes a device-wide exclusive prefix scan-by-key using the specified binary scan_op functor. The key equality is defined by equality_op . The init_value value is applied as the initial value, and is assigned to the beginning of each segment in d_values_out . More... | |
template<typename KeysInputIteratorT , typename ValuesInputIteratorT , typename ValuesOutputIteratorT , typename EqualityOpT = Equality> | |
static CUB_RUNTIME_FUNCTION cudaError_t | InclusiveSumByKey (void *d_temp_storage, size_t &temp_storage_bytes, KeysInputIteratorT d_keys_in, ValuesInputIteratorT d_values_in, ValuesOutputIteratorT d_values_out, int num_items, EqualityOpT equality_op=EqualityOpT(), cudaStream_t stream=0, bool debug_synchronous=false) |
Computes a device-wide inclusive prefix sum-by-key with key equality defined by equality_op . More... | |
template<typename KeysInputIteratorT , typename ValuesInputIteratorT , typename ValuesOutputIteratorT , typename ScanOpT , typename EqualityOpT = Equality> | |
static CUB_RUNTIME_FUNCTION cudaError_t | InclusiveScanByKey (void *d_temp_storage, size_t &temp_storage_bytes, KeysInputIteratorT d_keys_in, ValuesInputIteratorT d_values_in, ValuesOutputIteratorT d_values_out, ScanOpT scan_op, int num_items, EqualityOpT equality_op=EqualityOpT(), cudaStream_t stream=0, bool debug_synchronous=false) |
Computes a device-wide inclusive prefix scan-by-key using the specified binary scan_op functor. The key equality is defined by equality_op . More... | |
|
inlinestatic |
Computes a device-wide exclusive prefix sum. The value of 0 is applied as the initial value, and is assigned to *d_out.
d_temp_storage
is NULL
, no work is done and the required allocation size is returned in temp_storage_bytes
.int32
and int64
items, respectively.int
device vector. InputIteratorT | [inferred] Random-access input iterator type for reading scan inputs (may be a simple pointer type) |
OutputIteratorT | [inferred] Random-access output iterator type for writing scan outputs (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_in | Random-access iterator to the input sequence of data items |
[out] | d_out | Random-access iterator to the output sequence of data items |
[in] | num_items | Total number of input items (i.e., the length of d_in ) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
|
inlinestatic |
Computes a device-wide exclusive prefix scan using the specified binary scan_op
functor. The init_value
value is applied as the initial value, and is assigned to *d_out.
d_temp_storage
is NULL
, no work is done and the required allocation size is returned in temp_storage_bytes
.int
device vector InputIteratorT | [inferred] Random-access input iterator type for reading scan inputs (may be a simple pointer type) |
OutputIteratorT | [inferred] Random-access output iterator type for writing scan outputs (may be a simple pointer type) |
ScanOp | [inferred] Binary scan functor type having member T operator()(const T &a, const T &b) |
InitValueT | [inferred] Type of the init_value used Binary scan functor type having member T operator()(const T &a, const T &b) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_in | Random-access iterator to the input sequence of data items |
[out] | d_out | Random-access iterator to the output sequence of data items |
[in] | scan_op | Binary scan functor |
[in] | init_value | Initial value to seed the exclusive scan (and is assigned to *d_out) |
[in] | num_items | Total number of input items (i.e., the length of d_in ) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
|
inlinestatic |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_in | Pointer to the input sequence of data items |
[out] | d_out | Pointer to the output sequence of data items |
[in] | scan_op | Binary scan functor |
[in] | init_value | Initial value to seed the exclusive scan (and is assigned to *d_out) |
[in] | num_items | Total number of input items (i.e., the length of d_in ) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
|
inlinestatic |
Computes a device-wide inclusive prefix sum.
d_temp_storage
is NULL
, no work is done and the required allocation size is returned in temp_storage_bytes
.int
device vector. InputIteratorT | [inferred] Random-access input iterator type for reading scan inputs (may be a simple pointer type) |
OutputIteratorT | [inferred] Random-access output iterator type for writing scan outputs (may be a simple pointer type) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_in | Random-access iterator to the input sequence of data items |
[out] | d_out | Random-access iterator to the output sequence of data items |
[in] | num_items | Total number of input items (i.e., the length of d_in ) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
|
inlinestatic |
Computes a device-wide inclusive prefix scan using the specified binary scan_op
functor.
d_temp_storage
is NULL
, no work is done and the required allocation size is returned in temp_storage_bytes
.int
device vector. InputIteratorT | [inferred] Random-access input iterator type for reading scan inputs (may be a simple pointer type) |
OutputIteratorT | [inferred] Random-access output iterator type for writing scan outputs (may be a simple pointer type) |
ScanOp | [inferred] Binary scan functor type having member T operator()(const T &a, const T &b) |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_in | Random-access iterator to the input sequence of data items |
[out] | d_out | Random-access iterator to the output sequence of data items |
[in] | scan_op | Binary scan functor |
[in] | num_items | Total number of input items (i.e., the length of d_in ) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
|
inlinestatic |
Computes a device-wide exclusive prefix sum-by-key with key equality defined by equality_op
. The value of 0 is applied as the initial value, and is assigned to the beginning of each segment in d_values_out
.
d_temp_storage
is NULL
, no work is done and the required allocation size is returned in temp_storage_bytes
.int
device vector. KeysInputIteratorT | [inferred] Random-access input iterator type for reading scan keys inputs (may be a simple pointer type) |
ValuesInputIteratorT | [inferred] Random-access input iterator type for reading scan values inputs (may be a simple pointer type) |
ValuesOutputIteratorT | [inferred] Random-access output iterator type for writing scan values outputs (may be a simple pointer type) |
EqualityOpT | [inferred] Functor type having member T operator()(const T &a, const T &b) for binary operations that defines the equality of keys |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Random-access input iterator to the input sequence of key items |
[in] | d_values_in | Random-access input iterator to the input sequence of value items |
[out] | d_values_out | Random-access output iterator to the output sequence of value items |
[in] | num_items | Total number of input items (i.e., the length of d_keys_in and d_values_in ) |
[in] | equality_op | Binary functor that defines the equality of keys. Default is cub::Equality(). |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
|
inlinestatic |
Computes a device-wide exclusive prefix scan-by-key using the specified binary scan_op
functor. The key equality is defined by equality_op
. The init_value
value is applied as the initial value, and is assigned to the beginning of each segment in d_values_out
.
d_temp_storage
is NULL
, no work is done and the required allocation size is returned in temp_storage_bytes
.int
device vector KeysInputIteratorT | [inferred] Random-access input iterator type for reading scan keys inputs (may be a simple pointer type) |
ValuesInputIteratorT | [inferred] Random-access input iterator type for reading scan values inputs (may be a simple pointer type) |
ValuesOutputIteratorT | [inferred] Random-access output iterator type for writing scan values outputs (may be a simple pointer type) |
ScanOp | [inferred] Binary scan functor type having member T operator()(const T &a, const T &b) |
InitValueT | [inferred] Type of the init_value value used in Binary scan functor type having member T operator()(const T &a, const T &b) |
EqualityOpT | [inferred] Functor type having member T operator()(const T &a, const T &b) for binary operations that defines the equality of keys |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Random-access input iterator to the input sequence of key items |
[in] | d_values_in | Random-access input iterator to the input sequence of value items |
[out] | d_values_out | Random-access output iterator to the output sequence of value items |
[in] | scan_op | Binary scan functor |
[in] | init_value | Initial value to seed the exclusive scan (and is assigned to the beginning of each segment in d_values_out ) |
[in] | num_items | Total number of input items (i.e., the length of d_keys_in and d_values_in ) |
[in] | equality_op | Binary functor that defines the equality of keys. Default is cub::Equality(). |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
|
inlinestatic |
Computes a device-wide inclusive prefix sum-by-key with key equality defined by equality_op
.
d_temp_storage
is NULL
, no work is done and the required allocation size is returned in temp_storage_bytes
.int
device vector. KeysInputIteratorT | [inferred] Random-access input iterator type for reading scan keys inputs (may be a simple pointer type) |
ValuesInputIteratorT | [inferred] Random-access input iterator type for reading scan values inputs (may be a simple pointer type) |
ValuesOutputIteratorT | [inferred] Random-access output iterator type for writing scan values outputs (may be a simple pointer type) |
EqualityOpT | [inferred] Functor type having member T operator()(const T &a, const T &b) for binary operations that defines the equality of keys |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Random-access input iterator to the input sequence of key items |
[in] | d_values_in | Random-access input iterator to the input sequence of value items |
[out] | d_values_out | Random-access output iterator to the output sequence of value items |
[in] | num_items | Total number of input items (i.e., the length of d_keys_in and d_values_in ) |
[in] | equality_op | Binary functor that defines the equality of keys. Default is cub::Equality(). |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
|
inlinestatic |
Computes a device-wide inclusive prefix scan-by-key using the specified binary scan_op
functor. The key equality is defined by equality_op
.
d_temp_storage
is NULL
, no work is done and the required allocation size is returned in temp_storage_bytes
.int
device vector. KeysInputIteratorT | [inferred] Random-access input iterator type for reading scan keys inputs (may be a simple pointer type) |
ValuesInputIteratorT | [inferred] Random-access input iterator type for reading scan values inputs (may be a simple pointer type) |
ValuesOutputIteratorT | [inferred] Random-access output iterator type for writing scan values outputs (may be a simple pointer type) |
ScanOp | [inferred] Binary scan functor type having member T operator()(const T &a, const T &b) |
EqualityOpT | [inferred] Functor type having member T operator()(const T &a, const T &b) for binary operations that defines the equality of keys |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Random-access input iterator to the input sequence of key items |
[in] | d_values_in | Random-access input iterator to the input sequence of value items |
[out] | d_values_out | Random-access output iterator to the output sequence of value items |
[in] | scan_op | Binary scan functor |
[in] | num_items | Total number of input items (i.e., the length of d_keys_in and d_values_in ) |
[in] | equality_op | Binary functor that defines the equality of keys. Default is cub::Equality(). |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |