CUB  
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Classes | List of all members
cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > Class Template Reference

Detailed description

template< typename T, int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
class cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >

BlockAdjacentDifference provides collective methods for computing the differences of adjacent elements partitioned across a CUDA thread block.

Overview
  • BlockAdjacentDifference calculates the differences of adjacent elements in the elements partitioned across a CUDA thread block. Because the binary operation could be noncommutative, there are two sets of methods. Methods named SubtractLeft subtract left element i - 1 of input sequence from current element i. Methods named SubtractRight subtract current element i from the right one i + 1:
int values[4]; // [1, 2, 3, 4]
//...
int subtract_left_result[4]; <-- [ 1, 1, 1, 1 ]
int subtract_right_result[4]; <-- [ -1, -1, -1, 4 ]
  • For SubtractLeft, if the left element is out of bounds, the output value is assigned to input[0] without modification.
  • For SubtractRight, if the right element is out of bounds, the output value is assigned to the current input value without modification.
  • The following example under the examples/block folder illustrates usage of dynamically shared memory with BlockReduce and how to re-purpose the same memory region: example_block_reduce_dyn_smem.cu This example can be easily adapted to the storage required by BlockAdjacentDifference.
Snippet
The code snippet below illustrates how to use BlockAdjacentDifference to compute the left difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/block/block_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
__global__ void ExampleKernel(...)
{
// Specialize BlockAdjacentDifference for a 1D block of
// 128 threads of type int
using BlockAdjacentDifferenceT =
// Allocate shared memory for BlockDiscontinuity
__shared__ typename BlockAdjacentDifferenceT::TempStorage temp_storage;
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Collectively compute adjacent_difference
int result[4];
BlockAdjacentDifferenceT(temp_storage).SubtractLeft(
result,
thread_data,
CustomDifference());
Suppose the set of input thread_data across the block of threads is { [4,2,1,1], [1,1,1,1], [2,3,3,3], [3,4,1,4], ... }. The corresponding output result in those threads will be { [4,-2,-1,0], [0,0,0,0], [1,1,0,0], [0,1,-3,3], ... }.

Classes

struct  TempStorage
 The operations exposed by BlockDiscontinuity require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union'd with other storage allocation types to facilitate memory reuse. More...
 

Public Methods

Collective constructors
__device__ __forceinline__ BlockAdjacentDifference ()
 Collective constructor using a private static allocation of shared memory as temporary storage. More...
 
__device__ __forceinline__ BlockAdjacentDifference (TempStorage &temp_storage)
 Collective constructor using the specified memory allocation as temporary storage. More...
 
Read left operations
template<int ITEMS_PER_THREAD, typename OutputType , typename DifferenceOpT >
__device__ __forceinline__ void SubtractLeft (T(&input)[ITEMS_PER_THREAD], OutputType(&output)[ITEMS_PER_THREAD], DifferenceOpT difference_op)
 Subtracts the left element of each adjacent pair of elements partitioned across a CUDA thread block. More...
 
template<int ITEMS_PER_THREAD, typename OutputT , typename DifferenceOpT >
__device__ __forceinline__ void SubtractLeft (T(&input)[ITEMS_PER_THREAD], OutputT(&output)[ITEMS_PER_THREAD], DifferenceOpT difference_op, T tile_predecessor_item)
 Subtracts the left element of each adjacent pair of elements partitioned across a CUDA thread block. More...
 
template<int ITEMS_PER_THREAD, typename OutputType , typename DifferenceOpT >
__device__ __forceinline__ void SubtractLeftPartialTile (T(&input)[ITEMS_PER_THREAD], OutputType(&output)[ITEMS_PER_THREAD], DifferenceOpT difference_op, int valid_items)
 Subtracts the left element of each adjacent pair of elements partitioned across a CUDA thread block. More...
 
Read right operations
template<int ITEMS_PER_THREAD, typename OutputT , typename DifferenceOpT >
__device__ __forceinline__ void SubtractRight (T(&input)[ITEMS_PER_THREAD], OutputT(&output)[ITEMS_PER_THREAD], DifferenceOpT difference_op)
 Subtracts the right element of each adjacent pair of elements partitioned across a CUDA thread block. More...
 
template<int ITEMS_PER_THREAD, typename OutputT , typename DifferenceOpT >
__device__ __forceinline__ void SubtractRight (T(&input)[ITEMS_PER_THREAD], OutputT(&output)[ITEMS_PER_THREAD], DifferenceOpT difference_op, T tile_successor_item)
 Subtracts the right element of each adjacent pair of elements partitioned across a CUDA thread block. More...
 
template<int ITEMS_PER_THREAD, typename OutputT , typename DifferenceOpT >
__device__ __forceinline__ void SubtractRightPartialTile (T(&input)[ITEMS_PER_THREAD], OutputT(&output)[ITEMS_PER_THREAD], DifferenceOpT difference_op, int valid_items)
 Subtracts the right element of each adjacent pair in range of elements partitioned across a CUDA thread block. More...
 
Head flag operations (deprecated)
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeads (FlagT(&output)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeads (FlagT(&output)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op, T tile_predecessor_item)
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagTails (FlagT(&output)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagTails (FlagT(&output)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op, T tile_successor_item)
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeadsAndTails (FlagT(&head_flags)[ITEMS_PER_THREAD], FlagT(&tail_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeadsAndTails (FlagT(&head_flags)[ITEMS_PER_THREAD], FlagT(&tail_flags)[ITEMS_PER_THREAD], T tile_successor_item, T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeadsAndTails (FlagT(&head_flags)[ITEMS_PER_THREAD], T tile_predecessor_item, FlagT(&tail_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeadsAndTails (FlagT(&head_flags)[ITEMS_PER_THREAD], T tile_predecessor_item, FlagT(&tail_flags)[ITEMS_PER_THREAD], T tile_successor_item, T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 

Constructor & Destructor Documentation

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockAdjacentDifference ( )
inline

Collective constructor using a private static allocation of shared memory as temporary storage.

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockAdjacentDifference ( TempStorage temp_storage)
inline

Collective constructor using the specified memory allocation as temporary storage.

Parameters
[in]temp_storageReference to memory allocation having layout type TempStorage

Member Function Documentation

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename OutputType , typename DifferenceOpT >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::SubtractLeft ( T(&)  input[ITEMS_PER_THREAD],
OutputType(&)  output[ITEMS_PER_THREAD],
DifferenceOpT  difference_op 
)
inline

Subtracts the left element of each adjacent pair of elements partitioned across a CUDA thread block.

  • For multi-dimensional blocks, threads are linearly ranked in row-major order.
  • A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the collective's temporary storage (e.g., temp_storage) is to be reused or repurposed.
Snippet
The code snippet below illustrates how to use BlockAdjacentDifference to compute the left difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/block/block_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
__global__ void ExampleKernel(...)
{
// Specialize BlockAdjacentDifference for a 1D block
// of 128 threads of type int
using BlockAdjacentDifferenceT =
// Allocate shared memory for BlockDiscontinuity
__shared__ typename BlockAdjacentDifferenceT::TempStorage temp_storage;
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Collectively compute adjacent_difference
BlockAdjacentDifferenceT(temp_storage).SubtractLeft(
thread_data,
thread_data,
CustomDifference());
Suppose the set of input thread_data across the block of threads is { [4,2,1,1], [1,1,1,1], [2,3,3,3], [3,4,1,4], ... }. The corresponding output result in those threads will be { [4,-2,-1,0], [0,0,0,0], [1,1,0,0], [0,1,-3,3], ... }.
Parameters
[out]outputCalling thread's adjacent difference result
[in]inputCalling thread's input items (may be aliased to output)
[in]difference_opBinary difference operator
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename OutputT , typename DifferenceOpT >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::SubtractLeft ( T(&)  input[ITEMS_PER_THREAD],
OutputT(&)  output[ITEMS_PER_THREAD],
DifferenceOpT  difference_op,
tile_predecessor_item 
)
inline

Subtracts the left element of each adjacent pair of elements partitioned across a CUDA thread block.

  • For multi-dimensional blocks, threads are linearly ranked in row-major order.
  • A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the collective's temporary storage (e.g., temp_storage) is to be reused or repurposed.
Snippet
The code snippet below illustrates how to use BlockAdjacentDifference to compute the left difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/block/block_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
__global__ void ExampleKernel(...)
{
// Specialize BlockAdjacentDifference for a 1D block of
// 128 threads of type int
using BlockAdjacentDifferenceT =
// Allocate shared memory for BlockDiscontinuity
__shared__ typename BlockAdjacentDifferenceT::TempStorage temp_storage;
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// The last item in the previous tile:
int tile_predecessor_item = ...;
// Collectively compute adjacent_difference
BlockAdjacentDifferenceT(temp_storage).SubtractLeft(
thread_data,
thread_data,
CustomDifference(),
tile_predecessor_item);
Suppose the set of input thread_data across the block of threads is { [4,2,1,1], [1,1,1,1], [2,3,3,3], [3,4,1,4], ... }. and that tile_predecessor_item is 3. The corresponding output result in those threads will be { [1,-2,-1,0], [0,0,0,0], [1,1,0,0], [0,1,-3,3], ... }.
Parameters
[out]outputCalling thread's adjacent difference result
[in]inputCalling thread's input items (may be aliased to output)
[in]difference_opBinary difference operator
[in]tile_predecessor_item[thread0 only] item which is going to be subtracted from the first tile item (input0 from thread0).
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename OutputType , typename DifferenceOpT >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::SubtractLeftPartialTile ( T(&)  input[ITEMS_PER_THREAD],
OutputType(&)  output[ITEMS_PER_THREAD],
DifferenceOpT  difference_op,
int  valid_items 
)
inline

Subtracts the left element of each adjacent pair of elements partitioned across a CUDA thread block.

  • For multi-dimensional blocks, threads are linearly ranked in row-major order.
  • A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the collective's temporary storage (e.g., temp_storage) is to be reused or repurposed.
Snippet
The code snippet below illustrates how to use BlockAdjacentDifference to compute the left difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/block/block_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
__global__ void ExampleKernel(...)
{
// Specialize BlockAdjacentDifference for a 1D block of
// 128 threads of type int
using BlockAdjacentDifferenceT =
// Allocate shared memory for BlockDiscontinuity
__shared__ typename BlockAdjacentDifferenceT::TempStorage temp_storage;
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Collectively compute adjacent_difference
BlockAdjacentDifferenceT(temp_storage).SubtractLeft(
thread_data,
thread_data,
CustomDifference());
Suppose the set of input thread_data across the block of threads is { [4,2,1,1], [1,1,1,1], [2,3,3,3], [3,4,1,4], ... }. The corresponding output result in those threads will be { [4,-2,-1,0], [0,0,0,0], [1,1,0,0], [0,1,-3,3], ... }.
Parameters
[out]outputCalling thread's adjacent difference result
[in]inputCalling thread's input items (may be aliased to output)
[in]difference_opBinary difference operator
[in]valid_itemsNumber of valid items in thread block
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename OutputT , typename DifferenceOpT >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::SubtractRight ( T(&)  input[ITEMS_PER_THREAD],
OutputT(&)  output[ITEMS_PER_THREAD],
DifferenceOpT  difference_op 
)
inline

Subtracts the right element of each adjacent pair of elements partitioned across a CUDA thread block.

  • For multi-dimensional blocks, threads are linearly ranked in row-major order.
  • A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the collective's temporary storage (e.g., temp_storage) is to be reused or repurposed.
Snippet
The code snippet below illustrates how to use BlockAdjacentDifference to compute the right difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/block/block_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
__global__ void ExampleKernel(...)
{
// Specialize BlockAdjacentDifference for a 1D block of
// 128 threads of type int
using BlockAdjacentDifferenceT =
// Allocate shared memory for BlockDiscontinuity
__shared__ typename BlockAdjacentDifferenceT::TempStorage temp_storage;
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Collectively compute adjacent_difference
BlockAdjacentDifferenceT(temp_storage).SubtractRight(
thread_data,
thread_data,
CustomDifference());
Suppose the set of input thread_data across the block of threads is { ...3], [4,2,1,1], [1,1,1,1], [2,3,3,3], [3,4,1,4] }. The corresponding output result in those threads will be { ..., [-1,2,1,0], [0,0,0,-1], [-1,0,0,0], [-1,3,-3,4] }.
Parameters
[out]outputCalling thread's adjacent difference result
[in]inputCalling thread's input items (may be aliased to output)
[in]difference_opBinary difference operator
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename OutputT , typename DifferenceOpT >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::SubtractRight ( T(&)  input[ITEMS_PER_THREAD],
OutputT(&)  output[ITEMS_PER_THREAD],
DifferenceOpT  difference_op,
tile_successor_item 
)
inline

Subtracts the right element of each adjacent pair of elements partitioned across a CUDA thread block.

  • For multi-dimensional blocks, threads are linearly ranked in row-major order.
  • A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the collective's temporary storage (e.g., temp_storage) is to be reused or repurposed.
Snippet
The code snippet below illustrates how to use BlockAdjacentDifference to compute the right difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/block/block_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
__global__ void ExampleKernel(...)
{
// Specialize BlockAdjacentDifference for a 1D block of
// 128 threads of type int
using BlockAdjacentDifferenceT =
// Allocate shared memory for BlockDiscontinuity
__shared__ typename BlockAdjacentDifferenceT::TempStorage temp_storage;
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// The first item in the nest tile:
int tile_successor_item = ...;
// Collectively compute adjacent_difference
BlockAdjacentDifferenceT(temp_storage).SubtractRight(
thread_data,
thread_data,
CustomDifference(),
tile_successor_item);
Suppose the set of input thread_data across the block of threads is { ...3], [4,2,1,1], [1,1,1,1], [2,3,3,3], [3,4,1,4] }, and that tile_successor_item is 3. The corresponding output result in those threads will be { ..., [-1,2,1,0], [0,0,0,-1], [-1,0,0,0], [-1,3,-3,1] }.
Parameters
[out]outputCalling thread's adjacent difference result
[in]inputCalling thread's input items (may be aliased to output)
[in]difference_opBinary difference operator
[in]tile_successor_item[threadBLOCK_THREADS-1 only] item which is going to be subtracted from the last tile item (inputITEMS_PER_THREAD-1 from threadBLOCK_THREADS-1).
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename OutputT , typename DifferenceOpT >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::SubtractRightPartialTile ( T(&)  input[ITEMS_PER_THREAD],
OutputT(&)  output[ITEMS_PER_THREAD],
DifferenceOpT  difference_op,
int  valid_items 
)
inline

Subtracts the right element of each adjacent pair in range of elements partitioned across a CUDA thread block.

  • For multi-dimensional blocks, threads are linearly ranked in row-major order.
  • A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the collective's temporary storage (e.g., temp_storage) is to be reused or repurposed.
Snippet
The code snippet below illustrates how to use BlockAdjacentDifference to compute the right difference between adjacent elements.
#include <cub/cub.cuh>
// or equivalently <cub/block/block_adjacent_difference.cuh>
struct CustomDifference
{
template <typename DataType>
__device__ DataType operator()(DataType &lhs, DataType &rhs)
{
return lhs - rhs;
}
};
__global__ void ExampleKernel(...)
{
// Specialize BlockAdjacentDifference for a 1D block of
// 128 threads of type int
using BlockAdjacentDifferenceT =
// Allocate shared memory for BlockDiscontinuity
__shared__ typename BlockAdjacentDifferenceT::TempStorage temp_storage;
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Collectively compute adjacent_difference
BlockAdjacentDifferenceT(temp_storage).SubtractRightPartialTile(
thread_data,
thread_data,
CustomDifference(),
valid_items);
Suppose the set of input thread_data across the block of threads is { ...3], [4,2,1,1], [1,1,1,1], [2,3,3,3], [3,4,1,4] }. and that valid_items is 507. The corresponding output result in those threads will be { ..., [-1,2,1,0], [0,0,0,-1], [-1,0,3,3], [3,4,1,4] }.
Parameters
[out]outputCalling thread's adjacent difference result
[in]inputCalling thread's input items (may be aliased to output)
[in]difference_opBinary difference operator
[in]valid_itemsNumber of valid items in thread block
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeads ( FlagT(&)  output[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline
Deprecated:
[Since 1.14.0] The cub::BlockAdjacentDifference::FlagHeads APIs are deprecated. Use cub::BlockAdjacentDifference::SubtractLeft instead.
Parameters
[out]outputCalling thread's discontinuity result
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeads ( FlagT(&)  output[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op,
tile_predecessor_item 
)
inline
Deprecated:
[Since 1.14.0] The cub::BlockAdjacentDifference::FlagHeads APIs are deprecated. Use cub::BlockAdjacentDifference::SubtractLeft instead.
Parameters
[out]outputCalling thread's discontinuity result
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate
[in]tile_predecessor_item[thread0 only] Item with which to compare the first tile item (input0 from thread0).
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagTails ( FlagT(&)  output[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline
Deprecated:
[Since 1.14.0] The cub::BlockAdjacentDifference::FlagTails APIs are deprecated. Use cub::BlockAdjacentDifference::SubtractRight instead.
Parameters
[out]outputCalling thread's discontinuity result
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagTails ( FlagT(&)  output[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op,
tile_successor_item 
)
inline
Deprecated:
[Since 1.14.0] The cub::BlockAdjacentDifference::FlagTails APIs are deprecated. Use cub::BlockAdjacentDifference::SubtractRight instead.
Parameters
[out]outputCalling thread's discontinuity result
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate
[in]tile_successor_item[threadBLOCK_THREADS-1 only] Item with which to compare the last tile item (inputITEMS_PER_THREAD-1 from threadBLOCK_THREADS-1).
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeadsAndTails ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
FlagT(&)  tail_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline
Deprecated:
[Since 1.14.0] The cub::BlockAdjacentDifference::FlagHeadsAndTails APIs are deprecated. Use cub::BlockAdjacentDifference::SubtractLeft or cub::BlockAdjacentDifference::SubtractRight instead.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeadsAndTails ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
FlagT(&)  tail_flags[ITEMS_PER_THREAD],
tile_successor_item,
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline
Deprecated:
[Since 1.14.0] The cub::BlockAdjacentDifference::FlagHeadsAndTails APIs are deprecated. Use cub::BlockAdjacentDifference::SubtractLeft or cub::BlockAdjacentDifference::SubtractRight instead.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]tile_successor_item[threadBLOCK_THREADS-1 only] Item with which to compare the last tile item (inputITEMS_PER_THREAD-1 from threadBLOCK_THREADS-1).
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeadsAndTails ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
tile_predecessor_item,
FlagT(&)  tail_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline
Deprecated:
[Since 1.14.0] The cub::BlockAdjacentDifference::FlagHeadsAndTails APIs are deprecated. Use cub::BlockAdjacentDifference::SubtractLeft or cub::BlockAdjacentDifference::SubtractRight instead.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[in]tile_predecessor_item[thread0 only] Item with which to compare the first tile item (input0 from thread0).
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate
template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeadsAndTails ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
tile_predecessor_item,
FlagT(&)  tail_flags[ITEMS_PER_THREAD],
tile_successor_item,
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline
Deprecated:
[Since 1.14.0] The cub::BlockAdjacentDifference::FlagHeadsAndTails APIs are deprecated. Use cub::BlockAdjacentDifference::SubtractLeft or cub::BlockAdjacentDifference::SubtractRight instead.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[in]tile_predecessor_item[thread0 only] Item with which to compare the first tile item (input0 from thread0).
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]tile_successor_item[threadBLOCK_THREADS-1 only] Item with which to compare the last tile item (inputITEMS_PER_THREAD-1 from threadBLOCK_THREADS-1).
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate

The documentation for this class was generated from the following file: