: The user of Fermat and CUGAR needs to familiarize with the fact that on a GPU equipped system there is both a host, controlled by a CPU, and one or multiple GPU devices, with distinct memory spaces. Hence, there can be several types of functions and data-structures:

single-threaded functions that can be called by a host thread
single-threaded functions that can be called by a device thread
single-threaded functions that can be called both on the host and the device
parallel functions that can be called by a host thread, and spawn one or more sets of host threads
parallel functions that can be called by a host thread, but spawn one or more sets of device threads

data-structures that encapsulate host data and are meant to be used on the host (e.g. a resizable host vector, cugar::vector<host_tag,T>)
data-structures that encapsulate device data but are meant to be used on the host (e.g. a resizable device vector, cugar::vector<device_tag,T>)
data-structures that encapsulate device data and are meant to be used on the device

: Unified Virtual Memory (coming with the NVIDIA Maxwell generation) already allows to use any data-structure anywhere, but given the buses between CPUs and GPUs, it is still useful to sometimes have complete control of where the data lives.

Plain Views

: The fact that some data structures contain device data but can only be used from the host, coupled with the fact that dereferencing host side references from a device kernel would require going through a bus to access slow CPU memory, makes it advantageous to rethink how to communicate data between the two, and introduce the concept of plain views: in CUGAR's (and Fermat's) speech, a plain view of an object is essentially a shallow reference to an object's data encapsulated in a POD data structure that can be passed as a kernel parameter.

: CUGAR defines the generic function plain_view() to obtain the view of a given object. Analogously it defines the meta function plain_view_subtype<T>::type to get the type of the plain view of any given type T (where defined). Moreover, as a convention CUGAR's data structures T define the subtype T::plain_view_type and T::const_plain_view_type to identify their plain view types.

: As an example consider the following situation, where on the host you have created a large device vector you want to be filled by a device kernel. Ideally, you'd want to simply pass a reference to the vector to your kernel, as in:
__global__ void my_kernel( // the CUDA kernel
cugar::vector<device_tag,uint32>& vec) // ideally, receive a reference: doesn't work without UVM!
{
const uint32 tid = threadIdx.x + blockIdx.x * blockDim.x; // compute a linear thread id
if (tid < vec.size())
vec[tid] = tid * 10;
}
int main()
{
cugar::vector<device_tag,uint32> vec( 1000000 );
const uint32 blockdim = 128;
const uint32 n_blocks = util::divide_ri( vec.size(), blockdim );
my_kernel<<<n_blocks,blockdim>>>( vec );
}

: However, this won't be possible in CUDA until UVM is finally available. With Fermat, you'd do this instead:
__global__ void my_kernel( // the CUDA kernel
cugar::vector_view<uint32> vec) // Fermat's surrogate of a reference
{
const uint32 tid = threadIdx.x + blockIdx.x * blockDim.x; // compute a linear thread id
if (tid < vec.size())
vec[tid] = tid * 10;
}
int main()
{
cugar::vector<device_tag,uint32> vec( 1000000 );
const uint32 blockdim = 128;
const uint32 n_blocks = util::divide_ri( vec.size(), blockdim );
my_kernel<<<n_blocks,blockdim>>>( cugar::plain_view( vec ) );
}

: This basic pattern can be applied to all of CUGAR's and Fermat's data structures that are meant to be setup from the host and accessed from the device.