Fermat
Host & Device
The user of Fermat and CUGAR needs to familiarize with the fact that on a GPU equipped system there is both a host, controlled by a CPU, and one or multiple GPU devices, with distinct memory spaces. Hence, there can be several types of functions and data-structures:
  • single-threaded functions that can be called by a host thread
  • single-threaded functions that can be called by a device thread
  • single-threaded functions that can be called both on the host and the device
  • parallel functions that can be called by a host thread, and spawn one or more sets of host threads
  • parallel functions that can be called by a host thread, but spawn one or more sets of device threads
  • data-structures that encapsulate host data and are meant to be used on the host (e.g. a resizable host vector, cugar::vector<host_tag,T>)
  • data-structures that encapsulate device data but are meant to be used on the host (e.g. a resizable device vector, cugar::vector<device_tag,T>)
  • data-structures that encapsulate device data and are meant to be used on the device
Unified Virtual Memory (coming with the NVIDIA Maxwell generation) already allows to use any data-structure anywhere, but given the buses between CPUs and GPUs, it is still useful to sometimes have complete control of where the data lives.

Plain Views

The fact that some data structures contain device data but can only be used from the host, coupled with the fact that dereferencing host side references from a device kernel would require going through a bus to access slow CPU memory, makes it advantageous to rethink how to communicate data between the two, and introduce the concept of plain views: in CUGAR's (and Fermat's) speech, a plain view of an object is essentially a shallow reference to an object's data encapsulated in a POD data structure that can be passed as a kernel parameter.
CUGAR defines the generic function plain_view() to obtain the view of a given object. Analogously it defines the meta function plain_view_subtype<T>::type to get the type of the plain view of any given type T (where defined). Moreover, as a convention CUGAR's data structures T define the subtype T::plain_view_type and T::const_plain_view_type to identify their plain view types.
As an example consider the following situation, where on the host you have created a large device vector you want to be filled by a device kernel. Ideally, you'd want to simply pass a reference to the vector to your kernel, as in:
__global__ void my_kernel( // the CUDA kernel
cugar::vector<device_tag,uint32>& vec) // ideally, receive a reference: doesn't work without UVM!
{
const uint32 tid = threadIdx.x + blockIdx.x * blockDim.x; // compute a linear thread id
if (tid < vec.size())
vec[tid] = tid * 10;
}
int main()
{
const uint32 blockdim = 128;
const uint32 n_blocks = util::divide_ri( vec.size(), blockdim );
my_kernel<<<n_blocks,blockdim>>>( vec );
}
However, this won't be possible in CUDA until UVM is finally available. With Fermat, you'd do this instead:
__global__ void my_kernel( // the CUDA kernel
cugar::vector_view<uint32> vec) // Fermat's surrogate of a reference
{
const uint32 tid = threadIdx.x + blockIdx.x * blockDim.x; // compute a linear thread id
if (tid < vec.size())
vec[tid] = tid * 10;
}
int main()
{
const uint32 blockdim = 128;
const uint32 n_blocks = util::divide_ri( vec.size(), blockdim );
my_kernel<<<n_blocks,blockdim>>>( cugar::plain_view( vec ) );
}
This basic pattern can be applied to all of CUGAR's and Fermat's data structures that are meant to be setup from the host and accessed from the device.

Top: An Overture