CUB
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Groups
Class List
Here are the classes, structs, unions and interfaces with brief descriptions:
[detail level 123]
\NcubOptional outer namespace(s)
 oCCachingDeviceAllocatorA simple caching allocator for device memory allocations
 oCIfType selection (IF ? ThenType : ElseType)
 oCEqualsType equality test
 oCLog2Statically determine log2(N), rounded up
 oCPowerOfTwoStatically determine if N is a power-of-two
 oCIsPointerPointer vs. iterator
 oCIsVolatileVolatile modifier test
 oCRemoveQualifiersRemoves const and volatile qualifiers from type Tp
 oCArgIndexInputIteratorA random-access input wrapper for pairing dereferenced values with their corresponding indices (forming KeyValuePair tuples)
 oCCacheModifiedInputIteratorA random-access input wrapper for dereferencing array values using a PTX cache load modifier
 oCCacheModifiedOutputIteratorA random-access output wrapper for storing array values using a PTX cache-modifier
 oCConstantInputIteratorA random-access input generator for dereferencing a sequence of homogeneous values
 oCCountingInputIteratorA random-access input generator for dereferencing a sequence of incrementing integer values
 oCTexObjInputIteratorA random-access input wrapper for dereferencing array values through texture cache. Uses newer Kepler-style texture objects
 oCTexRefInputIteratorA random-access input wrapper for dereferencing array values through texture cache. Uses older Tesla/Fermi-style texture references
 oCTransformInputIteratorA random-access input wrapper for transforming dereferenced values
 oCEqualityDefault equality functor
 oCInequalityDefault inequality functor
 oCInequalityWrapperInequality functor (wraps equality functor)
 oCSumDefault sum functor
 oCMaxDefault max functor
 oCArgMaxArg max functor (keeps the value and offset of the first occurrence of the larger item)
 oCMinDefault min functor
 oCArgMinArg min functor (keeps the value and offset of the first occurrence of the smallest item)
 oCCastOpDefault cast functor
 oCSwizzleScanOpBinary operator wrapper for switching non-commutative scan arguments
 oCReduceBySegmentOpReduce-by-segment functor
 oCReduceByKeyOp< Binary reduction operator to apply to values
 oCBlockDiscontinuityThe BlockDiscontinuity class provides collective methods for flagging discontinuities within an ordered set of items partitioned across a CUDA thread block.

discont_logo.png
 |\CTempStorageThe operations exposed by BlockDiscontinuity require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse
 oCBlockExchangeThe BlockExchange class provides collective methods for rearranging data partitioned across a CUDA thread block.

transpose_logo.png
 |\CTempStorageThe operations exposed by BlockExchange require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse
 oCBlockHistogramThe BlockHistogram class provides collective methods for constructing block-wide histograms from data samples partitioned across a CUDA thread block.

histogram_logo.png
 |\CTempStorageThe operations exposed by BlockHistogram require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse
 oCBlockLoadThe BlockLoad class provides collective data movement methods for loading a linear segment of items from memory into a blocked arrangement across a CUDA thread block.

block_load_logo.png
 |\CTempStorageThe operations exposed by BlockLoad require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse
 oCBlockRadixSortThe BlockRadixSort class provides collective methods for sorting items partitioned across a CUDA thread block using a radix sorting method.

sorting_logo.png
 |\CTempStorageThe operations exposed by BlockRadixSort require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse
 oCBlockReduceThe BlockReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread block.

reduce_logo.png
 |\CTempStorageThe operations exposed by BlockReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse
 oCBlockScanThe BlockScan class provides collective methods for computing a parallel prefix sum/scan of items partitioned across a CUDA thread block.

block_scan_logo.png
 |\CTempStorageThe operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse
 oCBlockStoreThe BlockStore class provides collective data movement methods for writing a blocked arrangement of items partitioned across a CUDA thread block to a linear segment of memory.

block_store_logo.png
 |\CTempStorageThe operations exposed by BlockStore require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse
 oCDeviceHistogramDeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within device-accessible memory.

histogram_logo.png
 oCDevicePartitionDevicePartition provides device-wide, parallel operations for partitioning sequences of data items residing within device-accessible memory.

partition_logo.png
 oCDeviceRadixSortDeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within device-accessible memory.

sorting_logo.png
 oCDeviceReduceDeviceReduce provides device-wide, parallel operations for computing a reduction across a sequence of data items residing within device-accessible memory.

reduce_logo.png
 oCDeviceRunLengthEncodeDeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within device-accessible memory.

run_length_encode_logo.png
 oCDeviceScanDeviceScan provides device-wide, parallel operations for computing a prefix scan across a sequence of data items residing within device-accessible memory.

device_scan.png
 oCDeviceSegmentedRadixSortDeviceSegmentedRadixSort provides device-wide, parallel operations for computing a batched radix sort across multiple, non-overlapping sequences of data items residing within device-accessible memory.

segmented_sorting_logo.png
 oCDeviceSegmentedReduceDeviceSegmentedReduce provides device-wide, parallel operations for computing a reduction across multiple sequences of data items residing within device-accessible memory.

reduce_logo.png
 oCDeviceSelectDeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within device-accessible memory.

select_logo.png
 oCDeviceSpmvDeviceSpmv provides device-wide parallel operations for performing sparse-matrix * dense-vector multiplication (SpMV)
 oCWarpScanThe WarpScan class provides collective methods for computing a parallel prefix scan of items partitioned across a CUDA thread warp.

warp_scan_logo.png
 |\CTempStorageThe operations exposed by WarpScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse
 \CWarpReduceThe WarpReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread warp.

warp_reduce_logo.png
  \CTempStorageThe operations exposed by WarpReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse