: The user of NVBIO needs to familiarize with the fact that on a GPU equipped system there is both a host, controlled by a CPU, and one or multiple GPU devices, with distinct memory spaces. Hence, there can be several types of functions and data-structures:

single-threaded functions that can be called by a host thread
single-threaded functions that can be called by a device thread
single-threaded functions that can be called both on the host and the device
parallel functions that can be called by a host thread, and spawn one or more sets of host threads
parallel functions that can be called by a host thread, but spawn one or more sets of device threads

data-structures that encapsulate host data and are meant to be used on the host (e.g. a resizable host vector, nvbio::vector<host_tag,T>)
data-structures that encapsulate device data but are meant to be used on the host (e.g. a resizable device vector, nvbio::vector<device_tag,T>)
data-structures that encapsulate device data and are meant to be used on the device

: Unified Virtual Memory (coming with the NVIDIA Maxwell generation) will eventually allow to use any data-structure anywhere, but for now we have to cope with the distinct memory spaces.

Plain Views

: The fact that some data structures contain device data but can only be used from the host, coupled with the fact that at the moment CUDA does not allow to pass references as device kernel arguments and requires to pass PODs in, lends naturally to the definition of plain views: in NVBIO's speech, a plain view of an object is essentially a shallow reference to an object's data encapsulated in a POD data structure that can be passed as kernel parameters.

: NVBIO defines the generic function plain_view() to obtain the plain view of a given object. Analogously it defines the meta function plain_view_subtype<T>::type to get the type of the plain view of any given type T (where defined). Moreover, as a convention NVBIO's data structures T define the subtype T::plain_view_type and T::const_plain_view_type to identify their plain view types.

: As an example consider the following situation, where on the host you have created a large device vector you want to be filled by a device kernel. Ideally, you'd want to simply pass a reference to the vector to your kernel, as in:
__global__ void my_kernel( // the CUDA kernel

nvbio::vector<device_tag,uint32>& vec) // ideally, receive a reference: doesn't work without UVM!

{

const uint32 tid = threadIdx.x + blockIdx.x * blockDim.x; // compute a linear thread id

if (tid < vec.size())

vec[tid] = tid * 10;

}

int main()

{

nvbio::vector<device_tag,uint32> vec( 1000000 );

const uint32 blockdim = 128;

const uint32 n_blocks = util::divide_ri( vec.size(), blockdim );

my_kernel<<<n_blocks,blockdim>>>( vec );

}

: However, this won't be possible in CUDA until UVM is finally available. With NVBIO, you'd do this instead:
__global__ void my_kernel( // the CUDA kernel

nvbio::vector_view<uint32> vec) // NVBIO's surrogate of a reference

{

const uint32 tid = threadIdx.x + blockIdx.x * blockDim.x; // compute a linear thread id

if (tid < vec.size())

vec[tid] = tid * 10;

}

int main()

{

nvbio::vector<device_tag,uint32> vec( 1000000 );

const uint32 blockdim = 128;

const uint32 n_blocks = util::divide_ri( vec.size(), blockdim );

my_kernel<<<n_blocks,blockdim>>>( nvbio::plain_view( vec ) );

}

: This basic pattern can be applied to all of NVBIO's data structures that are meant to be setup from the host and accessed from the device.