|
CUB
|
Classes | |
| class | cub::WarpScan< T, LOGICAL_WARP_THREADS, PTX_ARCH > |
The WarpScan class provides collective methods for computing a parallel prefix scan of items partitioned across a CUDA thread warp.
. | |
| class | cub::WarpReduce< T, LOGICAL_WARP_THREADS, PTX_ARCH > |
The WarpReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread warp.
. | |
Functions | |
| template<typename T > | |
| __device__ __forceinline__ T | cub::ShuffleUp (T input, int src_offset, int first_lane, unsigned int member_mask) |
Shuffle-up for any data type. Each warp-lanei obtains the value input contributed by warp-lanei-src_offset. For thread lanes i < src_offset, the thread's own input is returned to the thread.
. | |
| template<typename T > | |
| __device__ __forceinline__ T | cub::ShuffleDown (T input, int src_offset, int last_lane, unsigned int member_mask) |
Shuffle-down for any data type. Each warp-lanei obtains the value input contributed by warp-lanei+src_offset. For thread lanes i >= WARP_THREADS, the thread's own input is returned to the thread.
. | |
| template<typename T > | |
| __device__ __forceinline__ T | cub::ShuffleIndex (T input, int src_lane, int logical_warp_threads, unsigned int member_mask) |
Shuffle-broadcast for any data type. Each warp-lanei obtains the value input contributed by warp-lanesrc_lane. For src_lane < 0 or src_lane >= WARP_THREADS, then the thread's own input is returned to the thread.
. | |
| __device__ __forceinline__ T cub::ShuffleUp | ( | T | input, |
| int | src_offset, | ||
| int | first_lane, | ||
| unsigned int | member_mask | ||
| ) |
Shuffle-up for any data type. Each warp-lanei obtains the value input contributed by warp-lanei-src_offset. For thread lanes i < src_offset, the thread's own input is returned to the thread.
double value from the predecessor of its predecessor. thread_data across the first warp of threads is {1.0, 2.0, 3.0, 4.0, 5.0, ..., 32.0}. The corresponding output peer_data will be {1.0, 2.0, 1.0, 2.0, 3.0, ..., 30.0}. | [in] | input | The value to broadcast |
| [in] | src_offset | The relative down-offset of the peer to read from |
| [in] | first_lane | Index of first lane in segment (typically 0) |
| [in] | member_mask | 32-bit mask of participating warp lanes |
Definition at line 515 of file util_ptx.cuh.
| __device__ __forceinline__ T cub::ShuffleDown | ( | T | input, |
| int | src_offset, | ||
| int | last_lane, | ||
| unsigned int | member_mask | ||
| ) |
Shuffle-down for any data type. Each warp-lanei obtains the value input contributed by warp-lanei+src_offset. For thread lanes i >= WARP_THREADS, the thread's own input is returned to the thread.
double value from the successor of its successor. thread_data across the first warp of threads is {1.0, 2.0, 3.0, 4.0, 5.0, ..., 32.0}. The corresponding output peer_data will be {3.0, 4.0, 5.0, 6.0, 7.0, ..., 32.0}. | [in] | input | The value to broadcast |
| [in] | src_offset | The relative up-offset of the peer to read from |
| [in] | last_lane | Index of first lane in segment (typically 31) |
| [in] | member_mask | 32-bit mask of participating warp lanes |
Definition at line 573 of file util_ptx.cuh.
| __device__ __forceinline__ T cub::ShuffleIndex | ( | T | input, |
| int | src_lane, | ||
| int | logical_warp_threads, | ||
| unsigned int | member_mask | ||
| ) |
Shuffle-broadcast for any data type. Each warp-lanei obtains the value input contributed by warp-lanesrc_lane. For src_lane < 0 or src_lane >= WARP_THREADS, then the thread's own input is returned to the thread.
double value from warp-lane0.thread_data across the first warp of threads is {1.0, 2.0, 3.0, 4.0, 5.0, ..., 32.0}. The corresponding output peer_data will be {1.0, 1.0, 1.0, 1.0, 1.0, ..., 1.0}. | [in] | input | The value to broadcast |
| [in] | src_lane | Which warp lane is to do the broadcasting |
| [in] | logical_warp_threads | Number of threads per logical warp |
| [in] | member_mask | 32-bit mask of participating warp lanes |
Definition at line 634 of file util_ptx.cuh.
1.8.4